The Yambo Project - User contributions [en]

Modena 2025

2025-05-20T13:47:17Z

Giacomo.sesti: /* DAY 3 - Wednesday, 21 May */

A general description of the goal(s) of the school can be found on the [https://www.yambo-code.eu/2025/01/17/yambo-school-modena-2025/ Yambo main website]

== Use CINECA computational resources ==
Yambo tutorials will be run on the Leonardo-DCGP partition. You can find info about Leonardo-DCGP [https://wiki.u-gov.it/confluence/display/SCAIUS/DCGP+Section here].
In order to access computational resources provided by CINECA you need your personal username and password that were sent you by the organizers.

=== Connect to the cluster using ssh ===

You can access Leonardo via <code>ssh</code> protocol in different ways.

''' - Connect using username and password '''

Use the following command replacing your username:
ssh username@login.leonardo.cineca.it

However, in this way you have to type your password each time you want to connect.

''' - Connect using ssh key '''

You can setup a ssh key pair to avoid typing the password each time you want to connect to Leonardo. To do so, run the <code>ssh-keygen</code> command to generate a private/public key pair:
ssh-keygen -t rsa -b 4096 -f ~/.ssh/leonardo_rsa
Generating public/private rsa key pair.
Created directory '/home/username/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/username/.ssh/leonardo_rsa
Your public key has been saved in /home/username/.ssh/leonardo_rsa.pub
The key fingerprint is:
[...]
The key's randomart image is:
[...]

Now you need to copy the '''public''' key to Leonardo. You can do that with the following command (for this step you need to type your password):
ssh-copy-id -i ~/.ssh/leonardo_rsa username@login.leonardo.cineca.it

<blockquote>
Be aware that when running the <code>ssh-copy-id</code> command, after typing "yes" at the prompt, you might see an error message like the one shown below. Don’t worry—just follow the instructions provided in this CINECA [https://wiki.u-gov.it/confluence/display/SCAIUS/FAQ#FAQ-Ikeepreceivingtheerrormessage%22WARNING:REMOTEHOSTIDENTIFICATIONHASCHANGED!%22evenifImodifyknown_hostfile guide to resolve the issue]. Once done, run the <code>ssh-copy-id</code> command again.
</blockquote>
/usr/bin/ssh-copy-id:
ERROR: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ERROR: @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
ERROR: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ERROR: IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
[...]

Once the public key has been copied, you can connect to Leonardo without having to type the password using the <code>-i</code> option:
ssh -i ~/.ssh/leonardo_rsa username@login.leonardo.cineca.it

To simplify even more, you can paste the following lines in a file named <code>config</code> located inside the <code>.ssh</code> directory adjusting the username:
Host leonardo
HostName login.leonardo.cineca.it
User username
IdentityFile ~/.ssh/leonardo_rsa

With the <code>config</code> file setup you can connect simply with
ssh leonardo

=== General instructions to run tutorials ===

Before proceeding, it is useful to know the different workspaces you have available on Leonardo, which can be accessed using environment variables. The main ones are:
* <code>$HOME</code>: it's the <code>home</code> directory associated to your username;
* <code>$WORK</code>: it's the <code>work</code> directory associated to the account where the computational resources dedicated to this school are allocated;
* <code>$SCRATCH</code>: it's the <code>scratch</code> directory associated to your username.
You can find more details about storage and FileSystems [https://wiki.u-gov.it/confluence/display/SCAIUS/4%3A+Data+storage+and+FileSystems here].

Please don't forget to '''run all tutorials in your scratch directory''':
echo $SCRATCH
/leonardo_scratch/large/userexternal/username
cd $SCRATCH

Computational resources on Leonardo are managed by the job scheduling system [https://slurm.schedmd.com/overview.html Slurm]. Most part of Yambo tutorials during this school can be run in serial, except some that need to be executed on multiple processors. Generally, Slurm batch jobs are submitted using a script, but the tutorials here are better understood if run interactively. The two procedures that we will use to submit interactive and non interactive jobs are explained below.

''' - Run a job using a batch script '''

This procedure is suggested for the tutorials and examples that need to be run in parallel. In these cases you need to submit the job using a batch script <code>job.sh</code>, whose generic structure is the following:
#!/bin/bash
#SBATCH --account=tra25_yambo # Charge resources used by this job to specified account
#SBATCH --time=00:10:00 # Set a limit on the total run time of the job allocation in hh:mm:ss
#SBATCH --job-name=JOB # Specify a name for the job allocation
#SBATCH --partition= dcgp_usr_prod # Request a specific partition for the resource allocation
#SBATCH --gres=tmpfs:10g # List of generic consumable resources
#SBATCH --qos=normal # Quality of service
#SBATCH --reservation=s_tra_yambo # Reservation specific to this school
#
#SBATCH --nodes=<N> # Number of nodes to be allocated for the job
#SBATCH --ntasks-per-node=<n> # Number of MPI tasks invoked per node
#SBATCH --ntasks-per-socket=<n/2> # Tasks invoked on each socket
#SBATCH --cpus-per-task=<c> # Number of OMP threads per task

module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

mpirun -np ${SLURM_NTASKS} \
yambo -F <input> -J <output>

Please note that the instructions in the batch script must be compatible with the specific Leonardo-DCGP [https://wiki.u-gov.it/confluence/display/SCAIUS/DCGP+Section#DCGPSection-SLURMpartitions resources]. The complete list of Slurm options can be found [https://slurm.schedmd.com/sbatch.html here]. However you will find '''ready-to-use''' batch scripts in locations specified during the tutorials.

To submit the job, use the <code>sbatch</code> command:
sbatch job.sh
Submitted batch job 15696508

To check the job status, use the <code>squeue</code> command:
squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
15696508 dcgp_usr_ job.sh username R 0:01 1 lrdn4135

If you need to cancel your job, do:
scancel <JOBID>

''' - Open an interactive session '''

This procedure is suggested for most of the tutorials, since the majority of these is meant to be run in serial (relatively to MPI parallelization) from the command line. Use the command below to open an interactive session of 4 hours:
srun -A tra25_yambo -p dcgp_usr_prod --reservation=s_tra_yambo -N 1 -n 1 -c 4 -t 04:00:00 --gres=tmpfs:10g --pty /bin/bash
srun: job 15694182 queued and waiting for resources
srun: job 15694182 has been allocated resources

We ask for 4 cpus-per-task (-c) because we can exploit OpenMP parallelization with the available resources.

Then, you need to manually load <code>yambo</code> as in the batch script above:
module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0

Finally, set the <code>OMP_NUM_THREADS</code> environment variable to 4 using the appropriate Slurm environment variable:
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

To close the interactive session when you have finished, log out of the compute node with the <code>exit</code> command:
exit

''' - Plot results with gnuplot '''

During the tutorials you will often need to plot the results of the calculations. In order to do so on Leonardo, '''open a new terminal window''' and connect to Leonardo enabling X11 forwarding with the <code>-X</code> option:
ssh -X leonardo

Please note that <code>gnuplot</code> can be used in this way only from the login nodes:
username@'''login01'''$ cd <directory_with_data>
username@'''login01'''$ gnuplot
...
Terminal type is now '...'
gnuplot> plot <...>

''' - Set up yambopy '''

In order to run yambopy on Leonardo, you must first activate the python environment:
module load profile/candidate
source /leonardo_work/tra25_yambo/env_yambopy/bin/activate

== Tutorials ==

Quick recap.
Before every tutorial, if you run on Leonardo, do the following steps

ssh leonardo
cd $SCRATCH
mkdir -p YAMBO_TUTORIALS '''#(Only if you didn't before)'''
cd YAMBO_TUTORIALS

Since the compute nodes are not connected to the external network, the tarballs must be downloaded before starting the interactive session.
Alternatively, once the interactive session has started, it is possible to access the tarballs by copying them from the following directories:

/leonardo_work/tra25_yambo/YAMBO_TUTORIALS
/leonardo_work/tra25_yambo/YAMBOPY_TUTORIALS

After that you can start the interactive session

srun -A tra25_yambo -p dcgp_usr_prod --reservation=s_tra_yambo -N 1 -n 1 -c 4 -t 04:00:00 --gres=tmpfs:10g --pty /bin/bash
[...]

set the environment variable for openMP

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

and load yambo or yambopy as explained above in the general instructions.

=== DAY 1 - Monday, 19 May ===

'''16:15 - 18:30 From the DFT ground state to the complete setup of a Many Body calculation using Yambo'''

To get the tutorial files needed for the following tutorials, follow these steps:
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-2D.tar.gz
$ ls
hBN-2D.tar.gz hBN.tar.gz
$ tar -xvf hBN-2D.tar.gz
$ tar -xvf hBN.tar.gz
$ ls
'''hBN-2D''' '''hBN''' hBN-2D.tar.gz hBN.tar.gz

Now that you have all the files, you may open the interactive job session with <code>srun</code> as explained above and proceed with the tutorials.

* [[First steps: walk through from DFT(standalone)|First steps: Initialization and more ]]

=== DAY 2 - Tuesday, 20 May ===

'''14:30 - 16:30 Linear response'''

* [[Next steps: RPA calculations (standalone)|Next steps: RPA calculations ]]

'''17:00 - 18:30 Introduction to Yambopy'''

At this point, you may learn about the python pre-postprocessing capabilities offered by yambopy, our python interface to yambo and QE. First of all, let's create a dedicated directory, download and extract the related files.

$ cd $SCRATCH
$ mkdir -p YAMBOPY_TUTORIALS
$ cd YAMBOPY_TUTORIALS
$ rsync -avzP /leonardo_work/tra25_yambo/YAMBOPY_TUTORIALS/yambopy_tutorial_Modena_2025.tar.gz .
$ tar --strip-components=1 -xvzf yambopy_tutorial_Modena_2025.tar.gz

Then, follow part 1 of the tutorial, which is related to DFT band structures, YAMBO initialization and linear response.
* [[Modena 2025 : Yambopy part 1]]

=== DAY 3 - Wednesday, 21 May ===

'''11:30 - 12:30 | 14:30 - 16:30 A tour through GW simulation in a complex material (from the blackboard to numerical computation: convergence, algorithms, parallel usage)'''

To get all the tutorial files needed for the following tutorials, follow these steps:

$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz
$ wget https://media.yambo-code.eu/educational/tutorials/files/MoS2_2Dquasiparticle_tutorial_Modena2025.tar.gz
$ tar -xvf hBN.tar.gz
$ tar -xvf MoS2_2Dquasiparticle_tutorial_Modena2025.tar.gz
$ cd hBN

Now you can start the first tutorial:

* [[GW tutorial Rome 2023 | GW computations on practice: how to obtain the quasi-particle band structure of a bulk material ]]

If you have gone through the first tutorial, pass now to the second one:

$ cd $SCRATCH
$ cd YAMBO_TUTORIALS
$ cd MoS2_2Dquasiparticle_tutorial_Modena2025

* [[Quasi-particles of a 2D system | Quasi-particles of a 2D system ]]

As for yambopy, the tutorial related to GW calculations is contained in the first section of Part 2

* [[Modena 2025 : Yambopy part 2#GW calculations| Modena 2025 : Yambopy part 2 (GW calculations)]]

'''17:00 - 18:30 Bethe-Salpeter equation (BSE)''' Fulvio Paleari (CNR-Nano, Italy), Davide Sangalli (CNR-ISM, Italy)

To get the tutorial files needed for the following tutorials, follow these steps:
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz # NOTE: YOU SHOULD ALREADY HAVE THIS FROM DAY 1
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-convergence-kpoints.tar.gz
$ tar -xvf hBN-convergence-kpoints.tar.gz
$ tar -xvf hBN.tar.gz

Now, you may open the interactive job session with <code>salloc</code> and proceed with the following tutorials.

* [[Calculating optical spectra including excitonic effects: a step-by-step guide|Perform a BSE calculation from beginning to end ]]
* [[How to analyse excitons - ICTP 2022 school|Analyse your results (exciton wavefunctions in real and reciprocal space, etc.) ]]
* [[BSE solvers overview|Solve the BSE eigenvalue problem with different numerical methods]]
* [[How to choose the input parameters|Choose the input parameters for a meaningful converged calculation]]

Now, go into the yambopy tutorial directory to learn about python analysis tools for the BSE:
$ cd $SCRATCH
$ cd YAMBOPY_TUTORIALS/databases_yambopy

* [[Modena 2025 : Yambopy part 2#Excitons| Modena 2025 : Yambopy part 2 (BSE calculations)]]

=== DAY 4 - Thursday, May 22 ===

'''14:30 - 16:00 Bethe-Salpeter (part 2)'''

'''16:30 - 17:30 Nonlinear response with the time dependent berry phase''' Myrta Gruning (Queen's University Belfast), Davide Sangalli (CNR-ISM, Italy)

For the tutorials we will use first the <code>hBN-2D-RT</code> folder (k-sampling 10x10x1) and then the <code>hBN-2D</code> folder (k-sampling 6x6x1)
You may already have them in the <code>YAMBO_TUTORIALS</code> folder
$ ls
'''hBN-2D''' '''hBN-2D-RT''' hBN-2D.tar.gz hBN-2D-RT.tar.gz

If you need to download again the tutorial files, follow these steps (or see the above instructions):
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-2D.tar.gz
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-2D-RT.tar.gz
$ tar -xvf hBN-2D.tar.gz
$ tar -xvf hBN-2D-RT.tar.gz

* [[Prerequisites for Real Time propagation with Yambo (5.3)|Perform the setup for a real-time calculation]]
* [[Dielectric function from Bloch-states dynamics (5.3)|Dielectric function from Bloch-states dynamics]]
* [[Second-harmonic generation of 2D-hBN (5.3)|Second-harmonic generation of 2D-hBN]]

=== DAY 5 - Friday, 23 May ===

== Lectures ==

=== DAY 1 - Monday, 19 May ===

* D. Varsano, [https://drive.google.com/file/d/1lbY6zF04WCcvZZhQy4TAIBca9wXHVmGG/view?usp=share_link Description and goal of the school].
* C. Franchini, [https://drive.google.com/file/d/1Z6GCjP4K1dM28ULsyYg2eckgUdYUSRph/view?usp=share_link First principles and data-driven correlated materials]
* F. Mohamed, [https://drive.google.com/file/d/1ITddkGTM12Gw5QxnZjAQpfZgYH0FvJL1/view?usp=share_link A tour on Density Functional Theory]
* E. Cannuccia, [https://drive.google.com/file/d/1mBTcPrnfoqwcA5wXE8gXQMO_qttClHAd/view?usp=share_link Electronic screening and linear response theory]

=== DAY 2 - Tuesday, 20 May ===

* A. Marini, [https://drive.google.com/file/d/1HTIPHkH2sBaVDLFwwS34T-fJ9x8FhVPq/view?usp=share_link Introduction to Many-Body Perturbation Theory]
* C. Cardoso, [https://drive.google.com/file/d/1SR9BtFKgz6Y1gaHSKF1s8xzb42D5C1Xg/view?usp=share_link Quasiparticles and the GW Approximation]
* A. Guandalini, [https://drive.google.com/file/d/1dgcdHMfA0b7jjyrCs4r9OrG6qpiu1v39/view?usp=share_link GW in practice: algorithms and approximations]
* G. Sesti, [https://drive.google.com/file/d/1te_85k9jgSymr3Av86rKOu0-tA-7sGWq/view?usp=sharing GW advanced algorithms]
* M. Govoni, [https://drive.google.com/file/d/1XBa5RgmwKdYSy4mj_COXwbUQd3DPgRe4/view?usp=share_link GW without empty states and investigation of neutral excitations by embedding full configuration interaction in DFT+GW]

=== DAY 3 - Wednesday, 21 May ===

* M. Palummo, [https://drive.google.com/file/d/1pQ491hqpETVLchL92QPy4f_jWqfMK5xf/view?usp=share_link Optical absorption and excitons via the Bethe-Salpeter Equation]
* D. Sangalli, Real-time simulations
* F. Paleari, Introduction to YamboPy (automation and post-processing)

=== DAY 4 - Thursday, 22 May ===

* E. Luppi, An introduction to Non-linear spectroscopy
* M. Grüning, [https://drive.google.com/file/d/1bZF0f3AD-WL3M3vCtvrnA_1W94SKt-Gf/view?usp=sharing Non-linear spectroscopy in Yambo]
* F. Affinito, Frontiers in High-Performance Computing

Quasi-particles of a 2D system

2025-05-20T13:46:45Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial_Modena2025/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --account=tra25_yambo # Charge resources used by this job to specified account
#SBATCH --time=00:04:00 # Set a limit on the total run time of the job allocation in hh:mm:ss
#SBATCH --job-name=rutile # Specify a name for the job allocation
#SBATCH --partition=dcgp_usr_prod # Request a specific partition for the resource allocation
#SBATCH --gres=tmpfs:10g # List of generic consumable resources
#SBATCH --qos=normal # Quality of service
#SBATCH --reservation=s_tra_yambo # Reservation specific to this school
#
#SBATCH --nodes=1 # Number of nodes to be allocated for the job
#SBATCH --ntasks-per-node=4 # Number of MPI tasks invoked per node
#SBATCH --cpus-per-task=1 # Number of OMP threads per task

# load yambo and dependencies
module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI4_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like leonardo, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 4 cpus with 2 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

<!-- ==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Modena 2025

2025-05-20T13:46:06Z

Giacomo.sesti: /* DAY 3 - Wednesday, 21 May */

A general description of the goal(s) of the school can be found on the [https://www.yambo-code.eu/2025/01/17/yambo-school-modena-2025/ Yambo main website]

== Use CINECA computational resources ==
Yambo tutorials will be run on the Leonardo-DCGP partition. You can find info about Leonardo-DCGP [https://wiki.u-gov.it/confluence/display/SCAIUS/DCGP+Section here].
In order to access computational resources provided by CINECA you need your personal username and password that were sent you by the organizers.

=== Connect to the cluster using ssh ===

You can access Leonardo via <code>ssh</code> protocol in different ways.

''' - Connect using username and password '''

Use the following command replacing your username:
ssh username@login.leonardo.cineca.it

However, in this way you have to type your password each time you want to connect.

''' - Connect using ssh key '''

You can setup a ssh key pair to avoid typing the password each time you want to connect to Leonardo. To do so, run the <code>ssh-keygen</code> command to generate a private/public key pair:
ssh-keygen -t rsa -b 4096 -f ~/.ssh/leonardo_rsa
Generating public/private rsa key pair.
Created directory '/home/username/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/username/.ssh/leonardo_rsa
Your public key has been saved in /home/username/.ssh/leonardo_rsa.pub
The key fingerprint is:
[...]
The key's randomart image is:
[...]

Now you need to copy the '''public''' key to Leonardo. You can do that with the following command (for this step you need to type your password):
ssh-copy-id -i ~/.ssh/leonardo_rsa username@login.leonardo.cineca.it

<blockquote>
Be aware that when running the <code>ssh-copy-id</code> command, after typing "yes" at the prompt, you might see an error message like the one shown below. Don’t worry—just follow the instructions provided in this CINECA [https://wiki.u-gov.it/confluence/display/SCAIUS/FAQ#FAQ-Ikeepreceivingtheerrormessage%22WARNING:REMOTEHOSTIDENTIFICATIONHASCHANGED!%22evenifImodifyknown_hostfile guide to resolve the issue]. Once done, run the <code>ssh-copy-id</code> command again.
</blockquote>
/usr/bin/ssh-copy-id:
ERROR: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ERROR: @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
ERROR: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ERROR: IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
[...]

Once the public key has been copied, you can connect to Leonardo without having to type the password using the <code>-i</code> option:
ssh -i ~/.ssh/leonardo_rsa username@login.leonardo.cineca.it

To simplify even more, you can paste the following lines in a file named <code>config</code> located inside the <code>.ssh</code> directory adjusting the username:
Host leonardo
HostName login.leonardo.cineca.it
User username
IdentityFile ~/.ssh/leonardo_rsa

With the <code>config</code> file setup you can connect simply with
ssh leonardo

=== General instructions to run tutorials ===

Before proceeding, it is useful to know the different workspaces you have available on Leonardo, which can be accessed using environment variables. The main ones are:
* <code>$HOME</code>: it's the <code>home</code> directory associated to your username;
* <code>$WORK</code>: it's the <code>work</code> directory associated to the account where the computational resources dedicated to this school are allocated;
* <code>$SCRATCH</code>: it's the <code>scratch</code> directory associated to your username.
You can find more details about storage and FileSystems [https://wiki.u-gov.it/confluence/display/SCAIUS/4%3A+Data+storage+and+FileSystems here].

Please don't forget to '''run all tutorials in your scratch directory''':
echo $SCRATCH
/leonardo_scratch/large/userexternal/username
cd $SCRATCH

Computational resources on Leonardo are managed by the job scheduling system [https://slurm.schedmd.com/overview.html Slurm]. Most part of Yambo tutorials during this school can be run in serial, except some that need to be executed on multiple processors. Generally, Slurm batch jobs are submitted using a script, but the tutorials here are better understood if run interactively. The two procedures that we will use to submit interactive and non interactive jobs are explained below.

''' - Run a job using a batch script '''

This procedure is suggested for the tutorials and examples that need to be run in parallel. In these cases you need to submit the job using a batch script <code>job.sh</code>, whose generic structure is the following:
#!/bin/bash
#SBATCH --account=tra25_yambo # Charge resources used by this job to specified account
#SBATCH --time=00:10:00 # Set a limit on the total run time of the job allocation in hh:mm:ss
#SBATCH --job-name=JOB # Specify a name for the job allocation
#SBATCH --partition= dcgp_usr_prod # Request a specific partition for the resource allocation
#SBATCH --gres=tmpfs:10g # List of generic consumable resources
#SBATCH --qos=normal # Quality of service
#SBATCH --reservation=s_tra_yambo # Reservation specific to this school
#
#SBATCH --nodes=<N> # Number of nodes to be allocated for the job
#SBATCH --ntasks-per-node=<n> # Number of MPI tasks invoked per node
#SBATCH --ntasks-per-socket=<n/2> # Tasks invoked on each socket
#SBATCH --cpus-per-task=<c> # Number of OMP threads per task

module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

mpirun -np ${SLURM_NTASKS} \
yambo -F <input> -J <output>

Please note that the instructions in the batch script must be compatible with the specific Leonardo-DCGP [https://wiki.u-gov.it/confluence/display/SCAIUS/DCGP+Section#DCGPSection-SLURMpartitions resources]. The complete list of Slurm options can be found [https://slurm.schedmd.com/sbatch.html here]. However you will find '''ready-to-use''' batch scripts in locations specified during the tutorials.

To submit the job, use the <code>sbatch</code> command:
sbatch job.sh
Submitted batch job 15696508

To check the job status, use the <code>squeue</code> command:
squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
15696508 dcgp_usr_ job.sh username R 0:01 1 lrdn4135

If you need to cancel your job, do:
scancel <JOBID>

''' - Open an interactive session '''

This procedure is suggested for most of the tutorials, since the majority of these is meant to be run in serial (relatively to MPI parallelization) from the command line. Use the command below to open an interactive session of 4 hours:
srun -A tra25_yambo -p dcgp_usr_prod --reservation=s_tra_yambo -N 1 -n 1 -c 4 -t 04:00:00 --gres=tmpfs:10g --pty /bin/bash
srun: job 15694182 queued and waiting for resources
srun: job 15694182 has been allocated resources

We ask for 4 cpus-per-task (-c) because we can exploit OpenMP parallelization with the available resources.

Then, you need to manually load <code>yambo</code> as in the batch script above:
module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0

Finally, set the <code>OMP_NUM_THREADS</code> environment variable to 4 using the appropriate Slurm environment variable:
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

To close the interactive session when you have finished, log out of the compute node with the <code>exit</code> command:
exit

''' - Plot results with gnuplot '''

During the tutorials you will often need to plot the results of the calculations. In order to do so on Leonardo, '''open a new terminal window''' and connect to Leonardo enabling X11 forwarding with the <code>-X</code> option:
ssh -X leonardo

Please note that <code>gnuplot</code> can be used in this way only from the login nodes:
username@'''login01'''$ cd <directory_with_data>
username@'''login01'''$ gnuplot
...
Terminal type is now '...'
gnuplot> plot <...>

''' - Set up yambopy '''

In order to run yambopy on Leonardo, you must first activate the python environment:
module load profile/candidate
source /leonardo_work/tra25_yambo/env_yambopy/bin/activate

== Tutorials ==

Quick recap.
Before every tutorial, if you run on Leonardo, do the following steps

ssh leonardo
cd $SCRATCH
mkdir -p YAMBO_TUTORIALS '''#(Only if you didn't before)'''
cd YAMBO_TUTORIALS

Since the compute nodes are not connected to the external network, the tarballs must be downloaded before starting the interactive session.
Alternatively, once the interactive session has started, it is possible to access the tarballs by copying them from the following directories:

/leonardo_work/tra25_yambo/YAMBO_TUTORIALS
/leonardo_work/tra25_yambo/YAMBOPY_TUTORIALS

After that you can start the interactive session

srun -A tra25_yambo -p dcgp_usr_prod --reservation=s_tra_yambo -N 1 -n 1 -c 4 -t 04:00:00 --gres=tmpfs:10g --pty /bin/bash
[...]

set the environment variable for openMP

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

and load yambo or yambopy as explained above in the general instructions.

=== DAY 1 - Monday, 19 May ===

'''16:15 - 18:30 From the DFT ground state to the complete setup of a Many Body calculation using Yambo'''

To get the tutorial files needed for the following tutorials, follow these steps:
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-2D.tar.gz
$ ls
hBN-2D.tar.gz hBN.tar.gz
$ tar -xvf hBN-2D.tar.gz
$ tar -xvf hBN.tar.gz
$ ls
'''hBN-2D''' '''hBN''' hBN-2D.tar.gz hBN.tar.gz

Now that you have all the files, you may open the interactive job session with <code>srun</code> as explained above and proceed with the tutorials.

* [[First steps: walk through from DFT(standalone)|First steps: Initialization and more ]]

=== DAY 2 - Tuesday, 20 May ===

'''14:30 - 16:30 Linear response'''

* [[Next steps: RPA calculations (standalone)|Next steps: RPA calculations ]]

'''17:00 - 18:30 Introduction to Yambopy'''

At this point, you may learn about the python pre-postprocessing capabilities offered by yambopy, our python interface to yambo and QE. First of all, let's create a dedicated directory, download and extract the related files.

$ cd $SCRATCH
$ mkdir -p YAMBOPY_TUTORIALS
$ cd YAMBOPY_TUTORIALS
$ rsync -avzP /leonardo_work/tra25_yambo/YAMBOPY_TUTORIALS/yambopy_tutorial_Modena_2025.tar.gz .
$ tar --strip-components=1 -xvzf yambopy_tutorial_Modena_2025.tar.gz

Then, follow part 1 of the tutorial, which is related to DFT band structures, YAMBO initialization and linear response.
* [[Modena 2025 : Yambopy part 1]]

=== DAY 3 - Wednesday, 21 May ===

'''11:30 - 12:30 | 14:30 - 16:30 A tour through GW simulation in a complex material (from the blackboard to numerical computation: convergence, algorithms, parallel usage)'''

To get all the tutorial files needed for the following tutorials, follow these steps:

$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz
$ wget https://media.yambo-code.eu/educational/tutorials/files/MoS2_2Dquasiparticle_tutorial_Modena2025.tar.gz
$ tar -xvf hBN.tar.gz
$ tar -xvf MoS2_2Dquasiparticle_tutorial_Modena2025.tar.gz
$ cd hBN

Now you can start the first tutorial:

* [[GW tutorial Rome 2023 | GW computations on practice: how to obtain the quasi-particle band structure of a bulk material ]]

If you have gone through the first tutorial, pass now to the second one:

$ cd $SCRATCH
$ cd YAMBO_TUTORIALS
$ cd MoS2_HPC_tutorial

* [[Quasi-particles of a 2D system | Quasi-particles of a 2D system ]]

As for yambopy, the tutorial related to GW calculations is contained in the first section of Part 2

* [[Modena 2025 : Yambopy part 2#GW calculations| Modena 2025 : Yambopy part 2 (GW calculations)]]

'''17:00 - 18:30 Bethe-Salpeter equation (BSE)''' Fulvio Paleari (CNR-Nano, Italy), Davide Sangalli (CNR-ISM, Italy)

To get the tutorial files needed for the following tutorials, follow these steps:
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz # NOTE: YOU SHOULD ALREADY HAVE THIS FROM DAY 1
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-convergence-kpoints.tar.gz
$ tar -xvf hBN-convergence-kpoints.tar.gz
$ tar -xvf hBN.tar.gz

Now, you may open the interactive job session with <code>salloc</code> and proceed with the following tutorials.

* [[Calculating optical spectra including excitonic effects: a step-by-step guide|Perform a BSE calculation from beginning to end ]]
* [[How to analyse excitons - ICTP 2022 school|Analyse your results (exciton wavefunctions in real and reciprocal space, etc.) ]]
* [[BSE solvers overview|Solve the BSE eigenvalue problem with different numerical methods]]
* [[How to choose the input parameters|Choose the input parameters for a meaningful converged calculation]]

Now, go into the yambopy tutorial directory to learn about python analysis tools for the BSE:
$ cd $SCRATCH
$ cd YAMBOPY_TUTORIALS/databases_yambopy

* [[Modena 2025 : Yambopy part 2#Excitons| Modena 2025 : Yambopy part 2 (BSE calculations)]]

=== DAY 4 - Thursday, May 22 ===

'''14:30 - 16:00 Bethe-Salpeter (part 2)'''

'''16:30 - 17:30 Nonlinear response with the time dependent berry phase''' Myrta Gruning (Queen's University Belfast), Davide Sangalli (CNR-ISM, Italy)

For the tutorials we will use first the <code>hBN-2D-RT</code> folder (k-sampling 10x10x1) and then the <code>hBN-2D</code> folder (k-sampling 6x6x1)
You may already have them in the <code>YAMBO_TUTORIALS</code> folder
$ ls
'''hBN-2D''' '''hBN-2D-RT''' hBN-2D.tar.gz hBN-2D-RT.tar.gz

If you need to download again the tutorial files, follow these steps (or see the above instructions):
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-2D.tar.gz
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-2D-RT.tar.gz
$ tar -xvf hBN-2D.tar.gz
$ tar -xvf hBN-2D-RT.tar.gz

* [[Prerequisites for Real Time propagation with Yambo (5.3)|Perform the setup for a real-time calculation]]
* [[Dielectric function from Bloch-states dynamics (5.3)|Dielectric function from Bloch-states dynamics]]
* [[Second-harmonic generation of 2D-hBN (5.3)|Second-harmonic generation of 2D-hBN]]

=== DAY 5 - Friday, 23 May ===

== Lectures ==

=== DAY 1 - Monday, 19 May ===

* D. Varsano, [https://drive.google.com/file/d/1lbY6zF04WCcvZZhQy4TAIBca9wXHVmGG/view?usp=share_link Description and goal of the school].
* C. Franchini, [https://drive.google.com/file/d/1Z6GCjP4K1dM28ULsyYg2eckgUdYUSRph/view?usp=share_link First principles and data-driven correlated materials]
* F. Mohamed, [https://drive.google.com/file/d/1ITddkGTM12Gw5QxnZjAQpfZgYH0FvJL1/view?usp=share_link A tour on Density Functional Theory]
* E. Cannuccia, [https://drive.google.com/file/d/1mBTcPrnfoqwcA5wXE8gXQMO_qttClHAd/view?usp=share_link Electronic screening and linear response theory]

=== DAY 2 - Tuesday, 20 May ===

* A. Marini, [https://drive.google.com/file/d/1HTIPHkH2sBaVDLFwwS34T-fJ9x8FhVPq/view?usp=share_link Introduction to Many-Body Perturbation Theory]
* C. Cardoso, [https://drive.google.com/file/d/1SR9BtFKgz6Y1gaHSKF1s8xzb42D5C1Xg/view?usp=share_link Quasiparticles and the GW Approximation]
* A. Guandalini, [https://drive.google.com/file/d/1dgcdHMfA0b7jjyrCs4r9OrG6qpiu1v39/view?usp=share_link GW in practice: algorithms and approximations]
* G. Sesti, [https://drive.google.com/file/d/1te_85k9jgSymr3Av86rKOu0-tA-7sGWq/view?usp=sharing GW advanced algorithms]
* M. Govoni, [https://drive.google.com/file/d/1XBa5RgmwKdYSy4mj_COXwbUQd3DPgRe4/view?usp=share_link GW without empty states and investigation of neutral excitations by embedding full configuration interaction in DFT+GW]

=== DAY 3 - Wednesday, 21 May ===

* M. Palummo, [https://drive.google.com/file/d/1pQ491hqpETVLchL92QPy4f_jWqfMK5xf/view?usp=share_link Optical absorption and excitons via the Bethe-Salpeter Equation]
* D. Sangalli, Real-time simulations
* F. Paleari, Introduction to YamboPy (automation and post-processing)

=== DAY 4 - Thursday, 22 May ===

* E. Luppi, An introduction to Non-linear spectroscopy
* M. Grüning, [https://drive.google.com/file/d/1bZF0f3AD-WL3M3vCtvrnA_1W94SKt-Gf/view?usp=sharing Non-linear spectroscopy in Yambo]
* F. Affinito, Frontiers in High-Performance Computing

Quasi-particles of a 2D system

2025-05-19T21:50:54Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --account=tra25_yambo # Charge resources used by this job to specified account
#SBATCH --time=00:04:00 # Set a limit on the total run time of the job allocation in hh:mm:ss
#SBATCH --job-name=rutile # Specify a name for the job allocation
#SBATCH --partition=dcgp_usr_prod # Request a specific partition for the resource allocation
#SBATCH --gres=tmpfs:10g # List of generic consumable resources
#SBATCH --qos=normal # Quality of service
#SBATCH --reservation=s_tra_yambo # Reservation specific to this school
#
#SBATCH --nodes=1 # Number of nodes to be allocated for the job
#SBATCH --ntasks-per-node=4 # Number of MPI tasks invoked per node
#SBATCH --cpus-per-task=1 # Number of OMP threads per task

# load yambo and dependencies
module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI4_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like leonardo, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 4 cpus with 2 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

<!-- ==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:49:56Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --account=tra25_yambo # Charge resources used by this job to specified account
#SBATCH --time=00:04:00 # Set a limit on the total run time of the job allocation in hh:mm:ss
#SBATCH --job-name=rutile # Specify a name for the job allocation
#SBATCH --partition=dcgp_usr_prod # Request a specific partition for the resource allocation
#SBATCH --gres=tmpfs:10g # List of generic consumable resources
#SBATCH --qos=normal # Quality of service
#SBATCH --reservation=s_tra_yambo # Reservation specific to this school
#
#SBATCH --nodes=1 # Number of nodes to be allocated for the job
#SBATCH --ntasks-per-node=4 # Number of MPI tasks invoked per node
#SBATCH --cpus-per-task=1 # Number of OMP threads per task

# load yambo and dependencies
module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI4_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like leonardo, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 2 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

<!-- ==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:47:50Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --account=tra25_yambo # Charge resources used by this job to specified account
#SBATCH --time=00:10:00 # Set a limit on the total run time of the job allocation in hh:mm:ss
#SBATCH --job-name=rutile # Specify a name for the job allocation
#SBATCH --partition=dcgp_usr_prod # Request a specific partition for the resource allocation
#SBATCH --gres=tmpfs:10g # List of generic consumable resources
#SBATCH --qos=normal # Quality of service
#SBATCH --reservation=s_tra_yambo # Reservation specific to this school
#
#SBATCH --nodes=1 # Number of nodes to be allocated for the job
#SBATCH --ntasks-per-node=4 # Number of MPI tasks invoked per node
#SBATCH --cpus-per-task=1 # Number of OMP threads per task

# load yambo and dependencies
module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI4_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like leonardo, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 2 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

<!-- ==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:47:11Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --account=tra25_yambo # Charge resources used by this job to specified account
#SBATCH --time=00:10:00 # Set a limit on the total run time of the job allocation in hh:mm:ss
#SBATCH --job-name=rutile # Specify a name for the job allocation
#SBATCH --partition=dcgp_usr_prod # Request a specific partition for the resource allocation
#SBATCH --gres=tmpfs:10g # List of generic consumable resources
#SBATCH --qos=normal # Quality of service
#SBATCH --reservation=s_tra_yambo # Reservation specific to this school
#
#SBATCH --nodes=1 # Number of nodes to be allocated for the job
#SBATCH --ntasks-per-node=4 # Number of MPI tasks invoked per node
#SBATCH --cpus-per-task=1 # Number of OMP threads per task

# load yambo and dependencies
module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI4_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like leonardo, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 2 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

<!-- ==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:45:49Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --account=tra25_yambo # Charge resources used by this job to specified account
#SBATCH --time=00:10:00 # Set a limit on the total run time of the job allocation in hh:mm:ss
#SBATCH --job-name=rutile # Specify a name for the job allocation
#SBATCH --partition=dcgp_usr_prod # Request a specific partition for the resource allocation
#SBATCH --gres=tmpfs:10g # List of generic consumable resources
#SBATCH --qos=normal # Quality of service
#SBATCH --reservation=s_tra_yambo # Reservation specific to this school
#
#SBATCH --nodes=1 # Number of nodes to be allocated for the job
#SBATCH --ntasks-per-node=NCPU # Number of MPI tasks invoked per node
#SBATCH --cpus-per-task=NOMP # Number of OMP threads per task

# load yambo and dependencies
module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

<!-- ==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:42:43Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --account=tra25_yambo # Charge resources used by this job to specified account
#SBATCH --time=00:10:00 # Set a limit on the total run time of the job allocation in hh:mm:ss
#SBATCH --job-name=rutile # Specify a name for the job allocation
#SBATCH --partition=dcgp_usr_prod # Request a specific partition for the resource allocation
#SBATCH --gres=tmpfs:10g # List of generic consumable resources
#SBATCH --qos=normal # Quality of service
#SBATCH --reservation=s_tra_yambo # Reservation specific to this school
#
#SBATCH --nodes=1 # Number of nodes to be allocated for the job
#SBATCH --ntasks-per-node=NCPU # Number of MPI tasks invoked per node
#SBATCH --cpus-per-task=NOMP # Number of OMP threads per task

# load yambo and dependencies
module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

<!-- ==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:41:35Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --account=tra25_yambo # Charge resources used by this job to specified account
#SBATCH --time=00:10:00 # Set a limit on the total run time of the job allocation in hh:mm:ss
#SBATCH --job-name=rutile # Specify a name for the job allocation
#SBATCH --partition=dcgp_usr_prod # Request a specific partition for the resource allocation
#SBATCH --gres=tmpfs:10g # List of generic consumable resources
#SBATCH --qos=normal # Quality of service
#SBATCH --reservation=s_tra_yambo # Reservation specific to this school
#
#SBATCH --nodes=1 # Number of nodes to be allocated for the job
#SBATCH --ntasks-per-node=4 # Number of MPI tasks invoked per node
#SBATCH --cpus-per-task=1 # Number of OMP threads per task

# load yambo and dependencies
module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

<!-- ==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:41:14Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --account=tra25_yambo # Charge resources used by this job to specified account
#SBATCH --time=00:10:00 # Set a limit on the total run time of the job allocation in hh:mm:ss
#SBATCH --job-name=rutile # Specify a name for the job allocation
#SBATCH --partition=dcgp_usr_prod # Request a specific partition for the resource allocation
#SBATCH --gres=tmpfs:10g # List of generic consumable resources
#SBATCH --qos=normal # Quality of service
#SBATCH --reservation=s_tra_yambo # Reservation specific to this school
#
#SBATCH --nodes=1 # Number of nodes to be allocated for the job
#SBATCH --ntasks-per-node=4 # Number of MPI tasks invoked per node
#SBATCH --cpus-per-task=1 # Number of OMP threads per task

# load yambo and dependencies
module purge
module load profile/candidate
module use /leonardo/pub/userexternal/nspallan/spack-0.22.2-06/modules
module load yambo/5.3.0--intel-oneapi-mpi--2021.12.1--oneapi--2024.1.0
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

<!-- ==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:39:30Z

Giacomo.sesti:

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

<!-- ==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:38:30Z

Giacomo.sesti:

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

<!-- ==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school
!
!For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

#So now move to:

# 02_GW_gpu

#Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the #calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

# vim gpu_job.sh

# #!/bin/bash
# #SBATCH --nodes=1
# #SBATCH --ntasks-per-node=2
# #SBATCH --cpus-per-task=1
# #SBATCH --gres=gpu:4

#each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still #the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not #using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

#In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific #compilation of the code.

# module purge
# module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
# export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

#In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional #modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:38:03Z

Giacomo.sesti:

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

<!==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school
!
!For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

#So now move to:

# 02_GW_gpu

#Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the #calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

# vim gpu_job.sh

# #!/bin/bash
# #SBATCH --nodes=1
# #SBATCH --ntasks-per-node=2
# #SBATCH --cpus-per-task=1
# #SBATCH --gres=gpu:4

#each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still #the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not #using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

#In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific #compilation of the code.

# module purge
# module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
# export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

#In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional #modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:35:40Z

Giacomo.sesti:

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

!==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school
!
!For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

#So now move to:

# 02_GW_gpu

#Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the #calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

# vim gpu_job.sh

# #!/bin/bash
# #SBATCH --nodes=1
# #SBATCH --ntasks-per-node=2
# #SBATCH --cpus-per-task=1
# #SBATCH --gres=gpu:4

#each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still #the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not #using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

#In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific #compilation of the code.

# module purge
# module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
# export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

#In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional #modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:34:53Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

#==Step 3: Running on GPU== hidden no gpu partition in Yambo 2025 school
#
#For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

#So now move to:

# 02_GW_gpu

#Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the #calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

# vim gpu_job.sh

# #!/bin/bash
# #SBATCH --nodes=1
# #SBATCH --ntasks-per-node=2
# #SBATCH --cpus-per-task=1
# #SBATCH --gres=gpu:4

#each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still #the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not #using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

#In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific #compilation of the code.

# module purge
# module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
# export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

#In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional #modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-19T21:33:39Z

Giacomo.sesti: /* Step 3: Running on GPU */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

#==Step 3: Running on GPU==
#
#For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Yambopy tutorial: band structures

2025-05-18T14:44:06Z

Giacomo.sesti:

This tutorial will show you how to visualise wave-function and band-structure related information, such as atomic, orbital and spin projections, following a DFT Quantum Espresso calculation using the "qepy" module of Yambopy. It also contains a section about Yambo and GW band structures.

The full tutorial, including the Quantum espresso and Yambo databases that we will read, can be downloaded and extracted from the yambo website:
wget https://media.yambo-code.eu/educational/tutorials/files/databases_qepy.tar.gz
tar -xvzf databases_qepy.tar.gz
cd databases_qepy

==Tutorial 1. BN (semiconductor). Band structure==

First enter in the folder
cd databases_qepy/bn-semiconductor
and have a look to the
vim plot-qe-bands.py
The qepy classes are useful both to execute Quantum Espresso and to analyze the results. Enter in the python environment, by typing <code>python</code>, then the full qepy library is imported by simply doing:

from qepy import *

===Plot Band structure===

The qepy class '''PwXML''' reads the data file generated by Quantum Espresso and post-processes the data. The class is instanced by doing:

xml = PwXML(prefix='bn', path='bands')

The variable prefix corresponds to the same variable of the QE input. The folder location is indicated by variable path. In order to plot the bands, we also define the k-points path (in crystal coordinates) using the function Path:

npoints = 50
path_kpoints = Path([ [[0.0, 0.0, 0.0],'$\Gamma$'],
[[0.5, 0.0, 0.0],'M'],
[[1./3,1./3,0.0],'K'],
[[0.0, 0.0, 0.0],'$\Gamma$']], [int(npoints*2),int(npoints),int(sqrt(5)*npoints)])

It is worth to note that the path should coincide with the selected path for the QE band calculations.

In order to show the plot we call the '''plot_eigen''' method of the '''PwXML''' class:

xml.plot_eigen(path_kpoints)

This function will automatically plot the bands as shown below running the script:

python plot-qe-bands.py

[[File:Bands BN 1.png| 400 px | center |Band structure of BN calculated at the level of DFT-LDA]]

Alternatively, we can use the function '''plot_eigen_ax'''. This functions requires as input a matplotlib '''figure''' object with given axes, as you will see in the next example.

===Plot atomic orbital projected Band structure===

In addition to the band structure, useful information regarding the atomic orbital nature of the electronic wave functions can be displayed using the class '''ProjwfcXML'''.
In order to make quantum espresso calculate the relevant data, we need to use the QE executable '''projwfc.x''', which will create the file '''atomic_proj.xml'''. This executable projects the Kohn-Sham wavefunctions onto orthogonalized atomic orbitals, among others functionalities. The orbital indexing and ordering are explained in the input documentation of the projwfc.x function which you are invited to check (https://www.quantum-espresso.org/Doc/INPUT_PROJWFC.html#idm94). We can run '''projwf.x''' directly from the python script using the qepy class '''ProjwfcIn''':

proj = ProjwfcIn(prefix='bn')
proj.run(folder='bands')

Be aware that this can take a while in a large system with many k-points. As in the class '''PwXML''', we then define the path_kpoints and also the selected atomic orbitals to project our bands. We have chosen to represent the projection weight onto the nitrogen (1) and boron (2) orbitals, which can be obtained with

# Automatic selection of the states
atom_1 = band.get_states_helper(atom_query=['N'])
atom_2 = band.get_states_helper(atom_query=['B'])

The full list of orbitals is written in the file '''bands/prowfc.log'''. By inspecting it, you may also construct custom lists of states.

We have also defined the figure box

import matplotlib.pyplot as plt
fig = plt.figure(figsize=(5,7))
ax = fig.add_axes( [ 0.12, 0.10, 0.70, 0.80 ])

The class '''ProjwfcXML''' then runs as in this example:

band = ProjwfcXML(prefix='bn',path='bands',qe_version='7.0')
band.plot_eigen(ax,path_kpoints=path_kpoints,cmap='viridis',selected_orbitals=atom_1,selected_orbitals_2=atom_2)

We can run now the file:

python plot-qe-orbitals.py

[[File:Bands AOP BN 1.png| 400 px | center | Atomic orbital projected band structure of monolayer BN]]

We have chosen the colormap 'viridis' (variable cmap). You see that the colormap goes from maximum '''selected_orbitals''' content (in this case nitrogen) to the maximum '''selected_orbitals_2''' content (in this case boron).
The colormap can be represented in many ways and qepy does not include a particular function for this. An example is:

import matplotlib as mpl
cmap=plt.get_cmap('viridis')
bx = fig.add_axes( [ 0.88, 0.10, 0.05, 0.80 ])
norm = mpl.colors.Normalize(vmin=0.,vmax=1.)
cb1 = mpl.colorbar.ColorbarBase(bx, cmap=cmap, norm=norm,orientation='vertical',ticks=[0,1])
cb1.set_ticklabels(['B', 'N'])

Suppose now that we have run the G0W0 calculation from the last tutorial, and we want to represent the atomic weights on top of the quasiparticle band structure instead of the Kohn-Sham one. However, we do not have the same kpoint grid between the G0W0 calculation and the quantum espresso "band" calculation along a path. We can then extract the parameters for a scissor operator (this is done below) and feed them to the '''ProjwfcXML''' class together with the number of valence bands. Try uncommenting the following lines in the tutorial script:

# Add scissor operator to the bands from a G0W0 calculation
scissor= [1.8985195950522469, 1.0265240811345133, 1.051588659878575]
n_val = 4
band.add_scissor(n_val,scissor)

Finally, we can also plot the band orbital character with size variation instead of a color scale. In this case we have to pass only the variable selected_orbitals (see the next tutorial).

==Tutorial 2. Iron. Ferromagnetic metallic material==

In the case of spin-polarized calculations we can plot the spin up and down band structures. We have included in the tutorial a small workflow example to run quantum espresso calculations from scratch. This is done using the classes Tasks and Flows developed in yambopy. You can check the file flow-iron.py for an example.
Yambopy includes basic predefined workflows to run the common QE and Yambo calculations. In this case we are using the class '''PwBandsTasks'''.

python flow-iron.py

In order to plot the spin-polarized bands. After doing all the calculations from scratch with the flows (flow-iron.py), we can make the band plot by running the script:

python plot-qe-bands.py

The class PwXML automatically detects the spin polarized case (nspin=2 in the QE input file). The spin up channel is displayed with red and the spin down channel with blue. In the case of iron we have selected this k-point path:

npoints = 50
path_kpoints = Path([ [[0.0, 0.0, 0.0 ],'G'],
[[0.0, 0.0, 1.0 ],'H'],
[[1./2,0.0,1./2.],'N'],
[[0.0, 0.0, 0.0 ],'G'],
[[1./2, 1./2, 1./2 ],'P'],
[[1./2,0.0,1./2. ],'N']], [npoints,npoints,npoints,npoints,npoints])

xml = PwXML(prefix='pw',path='bands/t0')
xml.plot_eigen(path_kpoints)

[[File:Figure 3-iron-bands.png| 600 px | center |Spin polarized band structure of iron calculated by DFT]]

The analysis of the projected atomic orbitals is also implemented. In this case the results are more cumbersome because the projection is separated in spin up and down channels. 
Let us look first at the file '''plot-qe-orbitals-size'''. 

'''NB: If you generated the quantum espresso databases using the flows instead of relying on the precomputed databases, you also need to rerun the quantum espresso executable projwfc.x to recompute the orbital projections. In this case, please uncomment the following lines in the script''' (plot-qe-orbitals-size.py) :
#proj = ProjwfcIn(prefix='pw')
#proj.run(folder='bands/t0')

Now, we can use the dot size as a function of the weight of the orbitals
# Automatic selection of the states
s = band.get_states_helper(orbital_query=['s'])
p = band.get_states_helper(orbital_query=['p'])
d = band.get_states_helper(orbital_query=['d'])

and the plots are done with
band = ProjwfcXML(prefix='pw',path='bands/t0',qe_version='7.0')
band.plot_eigen(ax,path_kpoints=path_kpoints,selected_orbitals=s,color='pink',color_2='black')
band.plot_eigen(ax,path_kpoints=path_kpoints,selected_orbitals=p,color='green',color_2='orange')
band.plot_eigen(ax,path_kpoints=path_kpoints,selected_orbitals=d,color='red',color_2='blue')

As an example, we can select just the ''d'' orbitals by commenting the first two plots and then running the file:

plot-qe-orbitals-size.py

[[File:Figure 4-iron-bands-size-d-orbitals.png|400px|center|Iron band structure. Size is proportional to the weights of the projection on atomic d-orbitals. Red (blue) is up (down) spin polarization.]]

From the plot is clear that ''d'' orbitals are mainly localized around the Fermi energy. The plot above is in red and blue, while the default choice in your script should be pink and black. You can experiment with the colors and other ''matplotlib'' plot options and also plot the other orbital types.

Another option is to plot the orbital composition as a colormap running the file:

plot-qe-orbitals-colormap.py

[[File:Figure 5-iron-bands-colormap.png|400px|center]]

Here we have added the p and d orbitals in the analysis:

p = band.get_states_helper(orbital_query=['p'])
d = band.get_states_helper(orbital_query=['d'])
band = ProjwfcXML(prefix='pw',path='bands/t0',qe_version='7.0')
band.plot_eigen(ax,path_kpoints=path_kpoints,cmap='viridis',cmap2='rainbow',selected_orbitals=p,selected_orbitals_2=d)

The colormap bar is added in the same way as in Tutorial 1 (see script), but this time we have a different colormap for each spin polarisation.

==Tutorial 3: GW bands==

Yambopy can be used either to run Yambo and QE calculations, or to analyse the results of QE and Yambo by dealing with their generated databases. This is done with a variety of classes included in the qepy (for QE) or yambopy (for Yambo) modules.
In the case of Yambo GW quasi-particle calculations, we can use the yambopy class '''YamboQPDB''' to read the database produced by the simulation.
Enter in the folder
cd ../gw-bands
We can use this to find the scissor operator, plot the GW bands and to interpolate the GW bands on a smoother k-path. The example runs by typing:

python plot-qp.py

See below for an explanation of the tutorial. As usual, we can import the qepy and yambopy libraries:

from qepy import *
from yambopy import *
import matplotlib.pyplot as plt

We define the k-points path.
npoints = 10
path = Path([ [[ 0.0, 0.0, 0.0],'$\Gamma$'],
[[ 0.5, 0.0, 0.0],'M'],
[[1./3.,1./3., 0.0],'K'],
[[ 0.0, 0.0, 0.0],'$\Gamma$']], [int(npoints*2),int(npoints),int(sqrt(5)*npoints)] )

Importantly, the number of points is a free choice. We can increase the variable npoints as we wish, it just means that the interpolation step will take more time. In order to analyse GW results we need to have the file related to the basic data of our Yambo calculation ('''SAVE/ns.db1''') and the netcdf file with the quasi-particle results ('''ndb.QP'''). We load the data calling the respective classes:

# Read Lattice information from SAVE
lat = YamboSaveDB.from_db_file(folder='SAVE',filename='ns.db1')
# Read QP database
ydb = YamboQPDB.from_db(filename='ndb.QP',folder='qp-gw')
(in the yambopy module, each class is specialised to read a specific Yambo database)

The first option is to plot the energies and calculate the ideal Kohn-Sham to GW scissor operator. We need to select the index of the top valence band:

n_top_vb = 4
ydb.plot_scissor_ax(ax,n_top_vb)

Yambopy displays the fitting and also the data of the slope of each fitting. Notice that this is also a test if the GW calculations are running well. '''If the dependence is not linear you should double-check your results!'''

[[File:Figure 6-slope-scissor.png|400px|center]]

In this case the slope is:

valence bands:
slope: 1.0515886598785766
conduction bands:
slope: 1.026524081134514
scissor list (shift,c,v) [eV,adim,adim]: [1.8985204833551723, 1.026524081134514, 1.0515886598785766]

In addition to the scissor operator, we can plot the GW (and DFT) band structure along the path. The first choice would be to plot the actual GW calculations, without interpolation, to check that the results are meaningful (or not). This plot is independent of the number of k-points ('''npoints''') that we want to put in the interpolation. The class YamboQPDB finds the calculated points that belong to the path and plots them. Be aware that if we use coarse grids the class would not find any point and the function will not work.

ks_bs_0, qp_bs_0 = ydb.get_bs_path(lat,path)
ks_bs_0.plot_ax(ax,legend=True,c_bands='r',label='KS')
qp_bs_0.plot_ax(ax,legend=True,c_bands='b',label='QP-GW')

[[File:Figure 7-GW-band-structure-non-interpolated.png|400px|center]]

The interpolation of the DFT and GW band structures looks similar:

ks_bs, qp_bs = ydb.interpolate(lat,path,what='QP+KS',lpratio=20)
ks_bs.plot_ax(ax,legend=True,c_bands='r',label='KS')
qp_bs.plot_ax(ax,legend=True,c_bands='b',label='QP-GW')

The '''lpratio''' can be increased if the interpolation does not work as well as intended. The SKW interpolation scheme is the same on implemented in abipy.

[[File:Figure 8-GW-band-structure-interpolated.png|400px|center]]

Finally, we can compare the calculated GW eigenvalues with the interpolation.

[[File:Figure 8-GW-band-structure-comparison.png|400px|center]]

==Links==
* Back to [[Rome 2023#Tutorials]]
* Back to [[ICTP 2022#Tutorials]]

Quasi-particles of a 2D system

2025-05-18T14:35:53Z

Giacomo.sesti: /* RIM-W */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed? You should expect a GW gap of 2.806 eV. A large decrease compared to the value obtained when performing a RIM integration of only the bare Coulomb potential within the correlation part of self-energy.

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-18T14:29:31Z

Giacomo.sesti: /* RIM-W */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
rim_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2025-05-18T14:28:02Z

Giacomo.sesti: /* RIM */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.116 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
RIM_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

First steps: walk through from DFT(standalone)

2025-05-18T14:21:02Z

Giacomo.sesti: /* Input file generator */

In this tutorial you will learn how to calculate optical spectra using Yambo, starting from a DFT calculation and ending with a look at local field effects in the optical response.

== System characteristics ==
We will use a 3D system (bulk hBN) and a 2D system (hBN sheet).

[[File:HBN-bulk-3x3-annotated.png|x200px|Atomic structure of bulk hBN]]
[[File:HBN2.png|x200px|Atomic structure of 2D hBN]]

'''Hexagonal boron nitride - hBN''':
* HCP lattice, ABAB stacking
* Four atoms per cell, B and N (16 electrons)
* Lattice constants: ''a'' = 4.716 [a.u.], ''c/a'' = 2.582
* Plane wave cutoff 40 Ry (~1500 RL vectors in wavefunctions)
* SCF run: shifted ''6x6x2'' grid (12 k-points) with 8 bands
* Non-SCF run: gamma-centred ''6x6x2'' (14 k-points) grid with 100 bands

=== Prerequisites ===
'''You will need''':


* Before starting, get the hBN tutorial files [https://www.yambo-code.eu/wiki/index.php/Tutorials#Tutorial_files here]
* <code>yambo</code> executable
* <code>gnuplot</code> for plotting spectra



==Initialization of Yambo databases==


Every Yambo run '''must''' start with this step. Go to the folder ''containing'' the hBN-bulk <code>SAVE</code> directory:
$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ ls
SAVE

'''TIP''': do not run yambo from ''inside'' the <code>SAVE</code> folder!
'''This is the wrong way .. '''
$ cd SAVE
$ yambo
yambo: cannot access CORE database (SAVE/*db1 and/or SAVE/*wf)
In fact, if you ever see such message:
it usually means you are trying to launch Yambo '''from the wrong place'''.
$ cd ..

Now you are in the proper place and
$ ls
SAVE
you can simply launch the code
$ yambo
This will run the initialization (setup) ''runlevel''.

===Run-time output===
This is typically written to standard output (on screen) and tracks the progress of the run in real time:
<---> [01] MPI/OPENMP structure, Files & I/O Directories
<---> [02] CORE Variables Setup
<---> [02.01] Unit cells
<---> [02.02] Symmetries
<---> [02.03] Reciprocal space
<---> Shells finder |########################################| [100%] --(E) --(X)
<---> [02.04] K-grid lattice
<---> Grid dimensions : 6 6 2
<---> [02.05] Energies & Occupations
<---> [03] Transferred momenta grid and indexing
<---> BZ -> IBZ reduction |########################################| [100%] --(E) --(X)
<---> [03.01] X indexes
<---> X [eval] |########################################| [100%] --(E) --(X)
<---> X[REDUX] |########################################| [100%] --(E) --(X)
<---> [03.01.01] Sigma indexes
<---> Sigma [eval] |########################################| [100%] --(E) --(X)
<---> Sigma[REDUX] |########################################| [100%] --(E) --(X)
<---> [04] Timing Overview
<---> [05] Memory Overview
<---> [06] Game Over & Game summary
Specific runlevels are indicated with numeric labels like [02.02]. 
The hashes (#) indicate progress of the run in Wall Clock time, indicating the elapsed (E) and expected (X) time to complete a runlevel, and the percentage of the task complete.
In this case the simulation was so fast that there is not output. On longer simulations you will be able to appreciate this feature.

===New core databases===
New databases appear in the ''SAVE'' folder:
$ ls SAVE
ns.db1 ns.wf ns.kb_pp_pwscf '''ndb.gops ndb.kindx'''
ns.wf_fragments_1_1 ...
ns.kb_pp_pwscf_fragment_1 ...
These contain information about the ''G''-vector shells and ''k/q''-point meshes as defined by the DFT calculation.

In general: a database called ''n'''s'''.xxx'' is a ''static'' database, generated once by <code>p2y</code>, while databases called ''n'''db'''.xxx'' are ''dynamically'' generated while you use <code>yambo</code>.

'''TIP''': if you launch <code>yambo</code>, but it does not seem to do anything, check that these files are present.

===Report file===
A report file ''r_setup'' is generated in the run directory.
This mostly reports information about the ground state system as defined by the DFT run, but also adds information about the band gaps, occupations, shells of G-vectors, IBZ/BZ grids, the CPU structure (for parallel runs), and so on. Some points of note:

Up to Yambo version 4.5
[02.03] RL shells
=================
Shells, format: [S#] G_RL(mHa)
[S453]:8029(0.7982E+5) [S452]:8005(0.7982E+5) [S451]:7981(0.7982E+5) [S450]:7957(0.7942E+5)
...
[S4]:11( 1183.) [S3]:5( 532.5123) [S2]:3( 133.1281) [S1]:1( 0.000000)

From Yambo version 5.0
[02.03] Reciprocal space
========================

nG shells : 217
nG charge : 3187
nG WFs : 1477
nC WFs : 1016
G-vecs. in first 21 shells: [ Number ]
1 3 5 11 13 25 37 39 51
63 65 71 83 95 107 113 125 127
139 151 163
...
Shell energy in first 21 shells: [ mHa ]
0.00000 133.128 532.512 1183.37 1198.15 1316.50 1715.88 2130.05 2381.52
3313.42 3328.20 3550.11 3683.24 4082.62 4511.57 4733.48 4748.27 4792.61
4866.61 5266.00 5680.16
...

This reports the set of closed reciprocal lattice (RL) shells defined internally that contain G-vectors with the same modulus.
The highest number of RL vectors we can use is 8029. Yambo will always redefine any input variable in RL units to the nearest closed shell.

Up to Yambo version 4.5
[02.05] Energies [ev] & Occupations
===================================
Fermi Level [ev]: 5.112805
VBM / CBm [ev]: 0.000000 3.876293
Electronic Temp. [ev K]: 0.00 0.00
Bosonic Temp. [ev K]: 0.00 0.00
El. density [cm-3]: 0.460E+24
States summary : Full Metallic Empty
0001-0008 0009-0100
Indirect Gaps [ev]: 3.876293 7.278081
Direct Gaps [ev]: 4.28829 11.35409
X BZ K-points : 72

From Yambo version 5.0
[02.05] Energies & Occupations
==============================

[X] === General ===
[X] Electronic Temperature : 0.000000 0.000000 [eV K]
[X] Bosonic Temperature : 0.000000 0.000000 [eV K]
[X] Finite Temperature mode : no
[X] El. density : 0.46037E+24 [cm-3]
[X] Fermi Level : 5.110835 [eV]

[X] === Gaps and Widths ===
[X] Conduction Band Min : 3.877976 [eV]
[X] Valence Band Max : 0.000000 [eV]
[X] Filled Bands : 8
[X] Empty Bands : 9 100
[X] Direct Gap : 4.289853 [eV]
[X] Direct Gap localized at k-point : 7
[X] Indirect Gap : 3.877976 [eV]
[X] Indirect Gap between k-points : 14 7
[X] Last valence band width : 3.401086 [eV]
[X] 1st conduction band width : 4.266292 [eV]

Yambo recalculates again the Fermi level (close to the value of 5.06 noted in the PWscf SCF calculation). From here on, however, the Fermi level is set to zero, and other eigenvalues are shifted accordingly. The system is insulating (8 filled, 92 empty) with an indirect band gap of 3.87 eV. The direct and indirect gaps are indicated. There are 72 k-points in the full BZ, generated using symmetry from the 14 k-points in our user-defined grid.

'''TIP''': You should inspect the report file after ''every'' run for errors and warnings.

===Different ways of running yambo===
We just run Yambo interactively.

Let's try to re-run the setup with the command
$ nohup yambo &
$ ls
l_setup nohup.out r_setup r_setup_01 SAVE

If Yambo is launched using a script, or as a background process, or in parallel, this output will appear in a log file prefixed by the letter ''l'', in this case as ''l_setup''.
If this log file already exists from a previous run, it will not be overwritten. Instead, a new file will be created with an incrementing numerical label, e.g. ''l_setup_01, l_setup_02'', etc. '''This applies to all files created by Yambo'''. Here we see that l_setup was created for the first time, but r_setup already existed from the previous run, so now we have r_setup_01
If you check the differences between the two you will notice that in the second run yambo is reading the previously created ndb.kindx in place of re-computing the indexes.
Indeed the output inside l_setup does not show the timing for X and Sigma

As a last step we run the setup in parallel, but first we delete the ndb.kindx file
$ rm SAVE/ndb.kindx
$ mpirun -np 4 yambo
$ ls
LOG l_setup nohup.out r_setup r_setup_01 r_setup_02 SAVE
There is now r_setup_02
In the case of parallel runs, CPU-dependent log files will appear inside a ''LOG'' folder, e.g.
$ ls LOG
l_setup_CPU_1 l_setup_CPU_2 l_setup_CPU_3 l_setup_CPU_4
This behaviour can be controlled at runtime - see the Parallel tutorial for details.

===2D hBN===
Simply repeat the steps above. Go to the folder ''containing'' the hBN-sheet ''SAVE'' directory and launch <code>yambo</code>:
$ cd TUTORIALS/hBN-2D/YAMBO
$ ls
SAVE
$ yambo
Again, inspect the ''r_setup'' file, output logs, and verify that ''ndb.gops'' and ''ndb.kpts'' have been created inside the SAVE folder.

You are now ready to use Yambo!

==Yambo's command line interface==
Yambo uses a command line interface to select tasks, generate input files, and control the runtime behaviour.

In this module you will learn how to select tasks, generate and modify input files, and control the runtime behaviour by using Yambo's command line interface.


=== Input file generator ===
We are going to work again with bulk hBN.
First, move to the appropriate folder and initialize the Yambo databases if you haven't already done so.
$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo ''(initialize)''

Yambo generates its own input files: you just tell the code what you want to calculate by launching Yambo along with one or more options.
To see the list of possible options, run <code>yambo -h</code> (we report here only the part we are focusing in)
$ yambo -h
'A shiny pot of fun and happiness [C.D.Hogan]'

This is : yambo
Version : 5.3.0 Revision 23927 Hash 1730222ea
Configuration: MPI+OpenMP+SLK+SLEPC+HDF5_MPI_IO

...

Initializations:
-setup (-i) :Initialization
-coulomb (-r) :Coulomb potential
-rw (-w) :Screened coulomb potential

Response Functions:
-optics (-o) <string> :Linear Response optical properties (more with -h optics)
-X (-d) <string> :Inverse Dielectric Matrix (more with -h X)
-dipoles (-q) :Oscillator strenghts (or dipoles)
-kernel (-k) <string> :Kernel (more with -h kernel)

Self-Energy:
-hf (-x) :Hartree-Fock
-gw0 (-p) <string> :GW approximation (more with -h gw0)
-dyson (-g) <string> :Dyson Equation solver (more with -h dyson)
-lifetimes (-l) :GoWo Quasiparticle lifetimes

Bethe-Salpeter Equation:
-Ksolver (-y) <string> :BSE solver (more with -h Ksolver)

Total Energy:
-acfdt :ACFDT Total Energy

Utilites:
...
-slktest :ScaLapacK test

The options can be split into two sets: 
* A set of options which is needed to generate the appropriate input file (default name: ''yambo.in'') selecting the kind of simulation you would like to perform 
* A set of options which can be used to manage auxiliary functions (like redirect the I/O, choose a specific name for the input file, etc ..).

===Runlevel selection===
First of all, you would like to specify which kind of simulation you are going to perform and generate an input file with the first set of options.
By default, when generating the input file, Yambo will launch the <code>vi</code> editor.
Editor choice can be changed when launching the configure before compilation; alternatively you can use the <code>-Q</code> run time option to skip the automatic editing (do this if you are not familiar with <code>vi</code>!):
$ yambo -hf -Q
yambo: input file yambo.in created
$ emacs yambo.in ''or your favourite editing tool''

Multiple options can be used together to activate various tasks or runlevels (in some cases this is actually a necessity).
For instance, to generate an input file for optical spectra including local field effects (Hartree approximation), do (and then exit)
$ yambo -optics c -kernel hartree ''which switches on:''
optics # [R] Linear Response optical properties
chi # [R][CHI] Dyson equation for Chi.
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
To perform a Hartree-Fock and GW calculation using a plasmon-pole approximation, do (and then exit):
$ yambo -hf -gw0 p -dyson n ''which switches on:''
HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
gw0 # [R GW] GoWo Quasiparticle energy levels
ppa # [R Xp] Plasmon Pole Approximation
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
Each runlevel activates its own list of variables and flags.

The previous command is also equivalent to
$ yambo -hf -gw0 r -dyson n -X p

===Changing input parameters ===
Yambo reads various parameters from existing database files and/or input files and uses them to suggest values or ranges.
Let's illustrate this by generating the input file for a Hartree-Fock calculation.

$ yambo -hf
Inside the generated input file you should find:
[[Variables#EXXRLvcs|EXXRLvcs]] = 3187 RL # [XX] Exchange RL components
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
1| 14| 6|10|
%
The <code>[[Variables#QPkrange|QPkrange]]</code> variable (follow the link for a "detailed" explanation for any variable) suggests a range of k-points (1 to 14) and bands (1 to 100) based on what it finds in the core database ''SAVE/ns.db1'', i.e. as defined by the DFT code. 
Leave that variable alone, and instead modify the previous variable to <code>EXXRLvcs= 1000 RL</code>

Save the file, and now generate the input a second time with <code>yambo -x</code>. You will see:
[[Variables#EXXRLvcs|EXXRLvcs]]= 1009 RL
This indicates that Yambo has read the new input value (1000 G-vectors), checked the database of G-vector shells ''(SAVE/ndb.gops)'',
and changed the input value to one that fits a completely closed shell.

Last, note that Yambo variables can be expressed in different '''units'''. In this case, <code>RL</code> can be replaced by an energy unit like Ry, eV, Ha, etc. Energy units are generally better as they are independent of the cell size. Technical information is available on the [[Variables]] page.

The input file generator of Yambo is thus an ''intelligent'' parser, which interacts with the user and the existing databases. For this reason we recommend that you always use Yambo to generate the input files, rather than making them yourself.

===Extra options===
Extra options modify some of the code's default settings. They can be used when launching the code but also when generating input files.

Let's have a look again to the possible options (we report here only the part we are focusing in):
$ yambo -h
This is : yambo
Version : 5.0.1 Revision 19547 Hash e90d90f2d
Configuration: MPI+OpenMP+SLK+SLEPC+HDF5_MPI_IO

Help & version:
-help (-h) <string> :<string> can be an option (e.g. -h optics)
-version :Code version & libraries

Input file & Directories:
-Input (-F) <string> :Input file
-Verbosity (-V) <string> :Input file variables verbosity (more with -h Verbosity)
-Job (-J) <string> :Job string
-Idir (-I) <string> :Input directory
-Odir (-O) <string> :I/O directory
-Cdir (-C) <string> :Communication directory

Parallel Control:
-parenv (-E) <string> :Environment Parallel Variables file
-nompi :Switch off MPI support
-noopenmp :Switch off OPENMP support

...

Utilites:
-Quiet (-Q) :Quiet input file creation
-fatlog :Verbose (fatter) log(s)
-DBlist (-D) :Databases properties
...

Command line options are extremely important to master if you want to use yambo productively.
Often, the meaning is clear from the help menu:
$ yambo -F yambo.in_HF -hf ''Make a Hartree-Fock input file called yambo.in_HF''
$ yambo -D ''Summarize the content of the databases in the SAVE folder''
$ yambo -I ../ ''Run the code, using a SAVE folder in a directory one level up''
$ yambo -C MyTest ''Run the code, putting all report, log, plot files inside a folder MyTest''

Other options deserve a closer look.

===Verbosity===
Yambo uses ''many'' input variables, many of which can be left at their default values. To keep input files short and manageable, only a few variables appear by default in the inout file. More advanced variables can be switched on by using the <code>-V</code> verbosity option. These are grouped according to the type of variable. For instance, <code>-V RL</code> switches on variables related to G vector summations, and <code>-V io</code> switches on options related to I/O control. Try:

$ yambo -optics c -V RL ''switches on:''
FFTGvecs= 3951 RL # [FFT] Plane-waves

$ yambo -optics c -V io ''switches on:''
StdoHash= 40 # [IO] Live-timing Hashes
DBsIOoff= "none" # [IO] Space-separated list of DB with NO I/O. DB= ...
DBsFRAGpm= "none" # [IO] Space-separated list of +DB to be FRAG and ...
#WFbuffIO # [IO] Wave-functions buffered I/O

Unfortunately, -V options must be invoked and changed ''one at a time''. When you are more expert, you may go straight to <code>-V all</code>, which turns on all possible variables. However note that <code>yambo -o c -V all</code> adds an extra 30 variables to the input file, which can be confusing: use it with care.

===Job script label===
The best way to keep track of different runs using different parameters is through the <code>-J</code> flag. This inserts a label in all output and report files, and creates a new folder containing any new databases (i.e. they are not written in the core ''SAVE'' folder). Try:
$ yambo -V RL -hf -F yambo_hf.in ''and modify to''
FFTGvecs = 1 Ry
EXXRLvcs = 1 Ry
VXCRLvcs = 1 Ry
$ yambo -J 1Ry -F yambo_hf.in ''Run the code''
$ ls
yambo_hf.in SAVE
o-1Ry.hf r-1Ry_HF_and_locXC 1Ry 1Ry/ndb.HF_and_locXC
This is extremely useful when running convergence tests, trying out different parameters, etc.

''Exercise'': use <code>yambo</code> to report the properties of all database files (including ''ndb.HF_and_locXC'')

==Links==
* Back to [[ICTP 2022#Tutorials]]
* Back to [[CECAM VIRTUAL 2021#Tutorials]]

First steps: walk through from DFT(standalone)

2025-05-18T13:56:11Z

Giacomo.sesti: /* Input file generator */

In this tutorial you will learn how to calculate optical spectra using Yambo, starting from a DFT calculation and ending with a look at local field effects in the optical response.

== System characteristics ==
We will use a 3D system (bulk hBN) and a 2D system (hBN sheet).

[[File:HBN-bulk-3x3-annotated.png|x200px|Atomic structure of bulk hBN]]
[[File:HBN2.png|x200px|Atomic structure of 2D hBN]]

'''Hexagonal boron nitride - hBN''':
* HCP lattice, ABAB stacking
* Four atoms per cell, B and N (16 electrons)
* Lattice constants: ''a'' = 4.716 [a.u.], ''c/a'' = 2.582
* Plane wave cutoff 40 Ry (~1500 RL vectors in wavefunctions)
* SCF run: shifted ''6x6x2'' grid (12 k-points) with 8 bands
* Non-SCF run: gamma-centred ''6x6x2'' (14 k-points) grid with 100 bands

=== Prerequisites ===
'''You will need''':


* Before starting, get the hBN tutorial files [https://www.yambo-code.eu/wiki/index.php/Tutorials#Tutorial_files here]
* <code>yambo</code> executable
* <code>gnuplot</code> for plotting spectra



==Initialization of Yambo databases==


Every Yambo run '''must''' start with this step. Go to the folder ''containing'' the hBN-bulk <code>SAVE</code> directory:
$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ ls
SAVE

'''TIP''': do not run yambo from ''inside'' the <code>SAVE</code> folder!
'''This is the wrong way .. '''
$ cd SAVE
$ yambo
yambo: cannot access CORE database (SAVE/*db1 and/or SAVE/*wf)
In fact, if you ever see such message:
it usually means you are trying to launch Yambo '''from the wrong place'''.
$ cd ..

Now you are in the proper place and
$ ls
SAVE
you can simply launch the code
$ yambo
This will run the initialization (setup) ''runlevel''.

===Run-time output===
This is typically written to standard output (on screen) and tracks the progress of the run in real time:
<---> [01] MPI/OPENMP structure, Files & I/O Directories
<---> [02] CORE Variables Setup
<---> [02.01] Unit cells
<---> [02.02] Symmetries
<---> [02.03] Reciprocal space
<---> Shells finder |########################################| [100%] --(E) --(X)
<---> [02.04] K-grid lattice
<---> Grid dimensions : 6 6 2
<---> [02.05] Energies & Occupations
<---> [03] Transferred momenta grid and indexing
<---> BZ -> IBZ reduction |########################################| [100%] --(E) --(X)
<---> [03.01] X indexes
<---> X [eval] |########################################| [100%] --(E) --(X)
<---> X[REDUX] |########################################| [100%] --(E) --(X)
<---> [03.01.01] Sigma indexes
<---> Sigma [eval] |########################################| [100%] --(E) --(X)
<---> Sigma[REDUX] |########################################| [100%] --(E) --(X)
<---> [04] Timing Overview
<---> [05] Memory Overview
<---> [06] Game Over & Game summary
Specific runlevels are indicated with numeric labels like [02.02]. 
The hashes (#) indicate progress of the run in Wall Clock time, indicating the elapsed (E) and expected (X) time to complete a runlevel, and the percentage of the task complete.
In this case the simulation was so fast that there is not output. On longer simulations you will be able to appreciate this feature.

===New core databases===
New databases appear in the ''SAVE'' folder:
$ ls SAVE
ns.db1 ns.wf ns.kb_pp_pwscf '''ndb.gops ndb.kindx'''
ns.wf_fragments_1_1 ...
ns.kb_pp_pwscf_fragment_1 ...
These contain information about the ''G''-vector shells and ''k/q''-point meshes as defined by the DFT calculation.

In general: a database called ''n'''s'''.xxx'' is a ''static'' database, generated once by <code>p2y</code>, while databases called ''n'''db'''.xxx'' are ''dynamically'' generated while you use <code>yambo</code>.

'''TIP''': if you launch <code>yambo</code>, but it does not seem to do anything, check that these files are present.

===Report file===
A report file ''r_setup'' is generated in the run directory.
This mostly reports information about the ground state system as defined by the DFT run, but also adds information about the band gaps, occupations, shells of G-vectors, IBZ/BZ grids, the CPU structure (for parallel runs), and so on. Some points of note:

Up to Yambo version 4.5
[02.03] RL shells
=================
Shells, format: [S#] G_RL(mHa)
[S453]:8029(0.7982E+5) [S452]:8005(0.7982E+5) [S451]:7981(0.7982E+5) [S450]:7957(0.7942E+5)
...
[S4]:11( 1183.) [S3]:5( 532.5123) [S2]:3( 133.1281) [S1]:1( 0.000000)

From Yambo version 5.0
[02.03] Reciprocal space
========================

nG shells : 217
nG charge : 3187
nG WFs : 1477
nC WFs : 1016
G-vecs. in first 21 shells: [ Number ]
1 3 5 11 13 25 37 39 51
63 65 71 83 95 107 113 125 127
139 151 163
...
Shell energy in first 21 shells: [ mHa ]
0.00000 133.128 532.512 1183.37 1198.15 1316.50 1715.88 2130.05 2381.52
3313.42 3328.20 3550.11 3683.24 4082.62 4511.57 4733.48 4748.27 4792.61
4866.61 5266.00 5680.16
...

This reports the set of closed reciprocal lattice (RL) shells defined internally that contain G-vectors with the same modulus.
The highest number of RL vectors we can use is 8029. Yambo will always redefine any input variable in RL units to the nearest closed shell.

Up to Yambo version 4.5
[02.05] Energies [ev] & Occupations
===================================
Fermi Level [ev]: 5.112805
VBM / CBm [ev]: 0.000000 3.876293
Electronic Temp. [ev K]: 0.00 0.00
Bosonic Temp. [ev K]: 0.00 0.00
El. density [cm-3]: 0.460E+24
States summary : Full Metallic Empty
0001-0008 0009-0100
Indirect Gaps [ev]: 3.876293 7.278081
Direct Gaps [ev]: 4.28829 11.35409
X BZ K-points : 72

From Yambo version 5.0
[02.05] Energies & Occupations
==============================

[X] === General ===
[X] Electronic Temperature : 0.000000 0.000000 [eV K]
[X] Bosonic Temperature : 0.000000 0.000000 [eV K]
[X] Finite Temperature mode : no
[X] El. density : 0.46037E+24 [cm-3]
[X] Fermi Level : 5.110835 [eV]

[X] === Gaps and Widths ===
[X] Conduction Band Min : 3.877976 [eV]
[X] Valence Band Max : 0.000000 [eV]
[X] Filled Bands : 8
[X] Empty Bands : 9 100
[X] Direct Gap : 4.289853 [eV]
[X] Direct Gap localized at k-point : 7
[X] Indirect Gap : 3.877976 [eV]
[X] Indirect Gap between k-points : 14 7
[X] Last valence band width : 3.401086 [eV]
[X] 1st conduction band width : 4.266292 [eV]

Yambo recalculates again the Fermi level (close to the value of 5.06 noted in the PWscf SCF calculation). From here on, however, the Fermi level is set to zero, and other eigenvalues are shifted accordingly. The system is insulating (8 filled, 92 empty) with an indirect band gap of 3.87 eV. The direct and indirect gaps are indicated. There are 72 k-points in the full BZ, generated using symmetry from the 14 k-points in our user-defined grid.

'''TIP''': You should inspect the report file after ''every'' run for errors and warnings.

===Different ways of running yambo===
We just run Yambo interactively.

Let's try to re-run the setup with the command
$ nohup yambo &
$ ls
l_setup nohup.out r_setup r_setup_01 SAVE

If Yambo is launched using a script, or as a background process, or in parallel, this output will appear in a log file prefixed by the letter ''l'', in this case as ''l_setup''.
If this log file already exists from a previous run, it will not be overwritten. Instead, a new file will be created with an incrementing numerical label, e.g. ''l_setup_01, l_setup_02'', etc. '''This applies to all files created by Yambo'''. Here we see that l_setup was created for the first time, but r_setup already existed from the previous run, so now we have r_setup_01
If you check the differences between the two you will notice that in the second run yambo is reading the previously created ndb.kindx in place of re-computing the indexes.
Indeed the output inside l_setup does not show the timing for X and Sigma

As a last step we run the setup in parallel, but first we delete the ndb.kindx file
$ rm SAVE/ndb.kindx
$ mpirun -np 4 yambo
$ ls
LOG l_setup nohup.out r_setup r_setup_01 r_setup_02 SAVE
There is now r_setup_02
In the case of parallel runs, CPU-dependent log files will appear inside a ''LOG'' folder, e.g.
$ ls LOG
l_setup_CPU_1 l_setup_CPU_2 l_setup_CPU_3 l_setup_CPU_4
This behaviour can be controlled at runtime - see the Parallel tutorial for details.

===2D hBN===
Simply repeat the steps above. Go to the folder ''containing'' the hBN-sheet ''SAVE'' directory and launch <code>yambo</code>:
$ cd TUTORIALS/hBN-2D/YAMBO
$ ls
SAVE
$ yambo
Again, inspect the ''r_setup'' file, output logs, and verify that ''ndb.gops'' and ''ndb.kpts'' have been created inside the SAVE folder.

You are now ready to use Yambo!

==Yambo's command line interface==
Yambo uses a command line interface to select tasks, generate input files, and control the runtime behaviour.

In this module you will learn how to select tasks, generate and modify input files, and control the runtime behaviour by using Yambo's command line interface.


=== Input file generator ===
We are going to work again with bulk hBN.
First, move to the appropriate folder and initialize the Yambo databases if you haven't already done so.
$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo ''(initialize)''

Yambo generates its own input files: you just tell the code what you want to calculate by launching Yambo along with one or more options.
To see the list of possible options, run <code>yambo -h</code> (we report here only the part we are focusing in)
$ yambo -h
'A shiny pot of fun and happiness [C.D.Hogan]'

This is : yambo
Version : 5.3.0 Revision 19547 Hash e90d90f2d
Configuration: MPI+OpenMP+SLK+SLEPC+HDF5_MPI_IO

...

Initializations:
-setup (-i) :Initialization
-coulomb (-r) :Coulomb potential
-rw (-w) :Screened coulomb potential

Response Functions:
-optics (-o) <string> :Linear Response optical properties (more with -h optics)
-X (-d) <string> :Inverse Dielectric Matrix (more with -h X)
-dipoles (-q) :Oscillator strenghts (or dipoles)
-kernel (-k) <string> :Kernel (more with -h kernel)

Self-Energy:
-hf (-x) :Hartree-Fock
-gw0 (-p) <string> :GW approximation (more with -h gw0)
-dyson (-g) <string> :Dyson Equation solver (more with -h dyson)
-lifetimes (-l) :GoWo Quasiparticle lifetimes

Bethe-Salpeter Equation:
-Ksolver (-y) <string> :BSE solver (more with -h Ksolver)

Total Energy:
-acfdt :ACFDT Total Energy

Utilites:
...
-slktest :ScaLapacK test

The options can be split into two sets: 
* A set of options which is needed to generate the appropriate input file (default name: ''yambo.in'') selecting the kind of simulation you would like to perform 
* A set of options which can be used to manage auxiliary functions (like redirect the I/O, choose a specific name for the input file, etc ..).

===Runlevel selection===
First of all, you would like to specify which kind of simulation you are going to perform and generate an input file with the first set of options.
By default, when generating the input file, Yambo will launch the <code>vi</code> editor.
Editor choice can be changed when launching the configure before compilation; alternatively you can use the <code>-Q</code> run time option to skip the automatic editing (do this if you are not familiar with <code>vi</code>!):
$ yambo -hf -Q
yambo: input file yambo.in created
$ emacs yambo.in ''or your favourite editing tool''

Multiple options can be used together to activate various tasks or runlevels (in some cases this is actually a necessity).
For instance, to generate an input file for optical spectra including local field effects (Hartree approximation), do (and then exit)
$ yambo -optics c -kernel hartree ''which switches on:''
optics # [R] Linear Response optical properties
chi # [R][CHI] Dyson equation for Chi.
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
To perform a Hartree-Fock and GW calculation using a plasmon-pole approximation, do (and then exit):
$ yambo -hf -gw0 p -dyson n ''which switches on:''
HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
gw0 # [R GW] GoWo Quasiparticle energy levels
ppa # [R Xp] Plasmon Pole Approximation
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
Each runlevel activates its own list of variables and flags.

The previous command is also equivalent to
$ yambo -hf -gw0 r -dyson n -X p

===Changing input parameters ===
Yambo reads various parameters from existing database files and/or input files and uses them to suggest values or ranges.
Let's illustrate this by generating the input file for a Hartree-Fock calculation.

$ yambo -hf
Inside the generated input file you should find:
[[Variables#EXXRLvcs|EXXRLvcs]] = 3187 RL # [XX] Exchange RL components
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
1| 14| 6|10|
%
The <code>[[Variables#QPkrange|QPkrange]]</code> variable (follow the link for a "detailed" explanation for any variable) suggests a range of k-points (1 to 14) and bands (1 to 100) based on what it finds in the core database ''SAVE/ns.db1'', i.e. as defined by the DFT code. 
Leave that variable alone, and instead modify the previous variable to <code>EXXRLvcs= 1000 RL</code>

Save the file, and now generate the input a second time with <code>yambo -x</code>. You will see:
[[Variables#EXXRLvcs|EXXRLvcs]]= 1009 RL
This indicates that Yambo has read the new input value (1000 G-vectors), checked the database of G-vector shells ''(SAVE/ndb.gops)'',
and changed the input value to one that fits a completely closed shell.

Last, note that Yambo variables can be expressed in different '''units'''. In this case, <code>RL</code> can be replaced by an energy unit like Ry, eV, Ha, etc. Energy units are generally better as they are independent of the cell size. Technical information is available on the [[Variables]] page.

The input file generator of Yambo is thus an ''intelligent'' parser, which interacts with the user and the existing databases. For this reason we recommend that you always use Yambo to generate the input files, rather than making them yourself.

===Extra options===
Extra options modify some of the code's default settings. They can be used when launching the code but also when generating input files.

Let's have a look again to the possible options (we report here only the part we are focusing in):
$ yambo -h
This is : yambo
Version : 5.0.1 Revision 19547 Hash e90d90f2d
Configuration: MPI+OpenMP+SLK+SLEPC+HDF5_MPI_IO

Help & version:
-help (-h) <string> :<string> can be an option (e.g. -h optics)
-version :Code version & libraries

Input file & Directories:
-Input (-F) <string> :Input file
-Verbosity (-V) <string> :Input file variables verbosity (more with -h Verbosity)
-Job (-J) <string> :Job string
-Idir (-I) <string> :Input directory
-Odir (-O) <string> :I/O directory
-Cdir (-C) <string> :Communication directory

Parallel Control:
-parenv (-E) <string> :Environment Parallel Variables file
-nompi :Switch off MPI support
-noopenmp :Switch off OPENMP support

...

Utilites:
-Quiet (-Q) :Quiet input file creation
-fatlog :Verbose (fatter) log(s)
-DBlist (-D) :Databases properties
...

Command line options are extremely important to master if you want to use yambo productively.
Often, the meaning is clear from the help menu:
$ yambo -F yambo.in_HF -hf ''Make a Hartree-Fock input file called yambo.in_HF''
$ yambo -D ''Summarize the content of the databases in the SAVE folder''
$ yambo -I ../ ''Run the code, using a SAVE folder in a directory one level up''
$ yambo -C MyTest ''Run the code, putting all report, log, plot files inside a folder MyTest''

Other options deserve a closer look.

===Verbosity===
Yambo uses ''many'' input variables, many of which can be left at their default values. To keep input files short and manageable, only a few variables appear by default in the inout file. More advanced variables can be switched on by using the <code>-V</code> verbosity option. These are grouped according to the type of variable. For instance, <code>-V RL</code> switches on variables related to G vector summations, and <code>-V io</code> switches on options related to I/O control. Try:

$ yambo -optics c -V RL ''switches on:''
FFTGvecs= 3951 RL # [FFT] Plane-waves

$ yambo -optics c -V io ''switches on:''
StdoHash= 40 # [IO] Live-timing Hashes
DBsIOoff= "none" # [IO] Space-separated list of DB with NO I/O. DB= ...
DBsFRAGpm= "none" # [IO] Space-separated list of +DB to be FRAG and ...
#WFbuffIO # [IO] Wave-functions buffered I/O

Unfortunately, -V options must be invoked and changed ''one at a time''. When you are more expert, you may go straight to <code>-V all</code>, which turns on all possible variables. However note that <code>yambo -o c -V all</code> adds an extra 30 variables to the input file, which can be confusing: use it with care.

===Job script label===
The best way to keep track of different runs using different parameters is through the <code>-J</code> flag. This inserts a label in all output and report files, and creates a new folder containing any new databases (i.e. they are not written in the core ''SAVE'' folder). Try:
$ yambo -V RL -hf -F yambo_hf.in ''and modify to''
FFTGvecs = 1 Ry
EXXRLvcs = 1 Ry
VXCRLvcs = 1 Ry
$ yambo -J 1Ry -F yambo_hf.in ''Run the code''
$ ls
yambo_hf.in SAVE
o-1Ry.hf r-1Ry_HF_and_locXC 1Ry 1Ry/ndb.HF_and_locXC
This is extremely useful when running convergence tests, trying out different parameters, etc.

''Exercise'': use <code>yambo</code> to report the properties of all database files (including ''ndb.HF_and_locXC'')

==Links==
* Back to [[ICTP 2022#Tutorials]]
* Back to [[CECAM VIRTUAL 2021#Tutorials]]

First steps: walk through from DFT(standalone)

2025-05-18T13:55:23Z

Giacomo.sesti: /* Input file generator */

In this tutorial you will learn how to calculate optical spectra using Yambo, starting from a DFT calculation and ending with a look at local field effects in the optical response.

== System characteristics ==
We will use a 3D system (bulk hBN) and a 2D system (hBN sheet).

[[File:HBN-bulk-3x3-annotated.png|x200px|Atomic structure of bulk hBN]]
[[File:HBN2.png|x200px|Atomic structure of 2D hBN]]

'''Hexagonal boron nitride - hBN''':
* HCP lattice, ABAB stacking
* Four atoms per cell, B and N (16 electrons)
* Lattice constants: ''a'' = 4.716 [a.u.], ''c/a'' = 2.582
* Plane wave cutoff 40 Ry (~1500 RL vectors in wavefunctions)
* SCF run: shifted ''6x6x2'' grid (12 k-points) with 8 bands
* Non-SCF run: gamma-centred ''6x6x2'' (14 k-points) grid with 100 bands

=== Prerequisites ===
'''You will need''':


* Before starting, get the hBN tutorial files [https://www.yambo-code.eu/wiki/index.php/Tutorials#Tutorial_files here]
* <code>yambo</code> executable
* <code>gnuplot</code> for plotting spectra



==Initialization of Yambo databases==


Every Yambo run '''must''' start with this step. Go to the folder ''containing'' the hBN-bulk <code>SAVE</code> directory:
$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ ls
SAVE

'''TIP''': do not run yambo from ''inside'' the <code>SAVE</code> folder!
'''This is the wrong way .. '''
$ cd SAVE
$ yambo
yambo: cannot access CORE database (SAVE/*db1 and/or SAVE/*wf)
In fact, if you ever see such message:
it usually means you are trying to launch Yambo '''from the wrong place'''.
$ cd ..

Now you are in the proper place and
$ ls
SAVE
you can simply launch the code
$ yambo
This will run the initialization (setup) ''runlevel''.

===Run-time output===
This is typically written to standard output (on screen) and tracks the progress of the run in real time:
<---> [01] MPI/OPENMP structure, Files & I/O Directories
<---> [02] CORE Variables Setup
<---> [02.01] Unit cells
<---> [02.02] Symmetries
<---> [02.03] Reciprocal space
<---> Shells finder |########################################| [100%] --(E) --(X)
<---> [02.04] K-grid lattice
<---> Grid dimensions : 6 6 2
<---> [02.05] Energies & Occupations
<---> [03] Transferred momenta grid and indexing
<---> BZ -> IBZ reduction |########################################| [100%] --(E) --(X)
<---> [03.01] X indexes
<---> X [eval] |########################################| [100%] --(E) --(X)
<---> X[REDUX] |########################################| [100%] --(E) --(X)
<---> [03.01.01] Sigma indexes
<---> Sigma [eval] |########################################| [100%] --(E) --(X)
<---> Sigma[REDUX] |########################################| [100%] --(E) --(X)
<---> [04] Timing Overview
<---> [05] Memory Overview
<---> [06] Game Over & Game summary
Specific runlevels are indicated with numeric labels like [02.02]. 
The hashes (#) indicate progress of the run in Wall Clock time, indicating the elapsed (E) and expected (X) time to complete a runlevel, and the percentage of the task complete.
In this case the simulation was so fast that there is not output. On longer simulations you will be able to appreciate this feature.

===New core databases===
New databases appear in the ''SAVE'' folder:
$ ls SAVE
ns.db1 ns.wf ns.kb_pp_pwscf '''ndb.gops ndb.kindx'''
ns.wf_fragments_1_1 ...
ns.kb_pp_pwscf_fragment_1 ...
These contain information about the ''G''-vector shells and ''k/q''-point meshes as defined by the DFT calculation.

In general: a database called ''n'''s'''.xxx'' is a ''static'' database, generated once by <code>p2y</code>, while databases called ''n'''db'''.xxx'' are ''dynamically'' generated while you use <code>yambo</code>.

'''TIP''': if you launch <code>yambo</code>, but it does not seem to do anything, check that these files are present.

===Report file===
A report file ''r_setup'' is generated in the run directory.
This mostly reports information about the ground state system as defined by the DFT run, but also adds information about the band gaps, occupations, shells of G-vectors, IBZ/BZ grids, the CPU structure (for parallel runs), and so on. Some points of note:

Up to Yambo version 4.5
[02.03] RL shells
=================
Shells, format: [S#] G_RL(mHa)
[S453]:8029(0.7982E+5) [S452]:8005(0.7982E+5) [S451]:7981(0.7982E+5) [S450]:7957(0.7942E+5)
...
[S4]:11( 1183.) [S3]:5( 532.5123) [S2]:3( 133.1281) [S1]:1( 0.000000)

From Yambo version 5.0
[02.03] Reciprocal space
========================

nG shells : 217
nG charge : 3187
nG WFs : 1477
nC WFs : 1016
G-vecs. in first 21 shells: [ Number ]
1 3 5 11 13 25 37 39 51
63 65 71 83 95 107 113 125 127
139 151 163
...
Shell energy in first 21 shells: [ mHa ]
0.00000 133.128 532.512 1183.37 1198.15 1316.50 1715.88 2130.05 2381.52
3313.42 3328.20 3550.11 3683.24 4082.62 4511.57 4733.48 4748.27 4792.61
4866.61 5266.00 5680.16
...

This reports the set of closed reciprocal lattice (RL) shells defined internally that contain G-vectors with the same modulus.
The highest number of RL vectors we can use is 8029. Yambo will always redefine any input variable in RL units to the nearest closed shell.

Up to Yambo version 4.5
[02.05] Energies [ev] & Occupations
===================================
Fermi Level [ev]: 5.112805
VBM / CBm [ev]: 0.000000 3.876293
Electronic Temp. [ev K]: 0.00 0.00
Bosonic Temp. [ev K]: 0.00 0.00
El. density [cm-3]: 0.460E+24
States summary : Full Metallic Empty
0001-0008 0009-0100
Indirect Gaps [ev]: 3.876293 7.278081
Direct Gaps [ev]: 4.28829 11.35409
X BZ K-points : 72

From Yambo version 5.0
[02.05] Energies & Occupations
==============================

[X] === General ===
[X] Electronic Temperature : 0.000000 0.000000 [eV K]
[X] Bosonic Temperature : 0.000000 0.000000 [eV K]
[X] Finite Temperature mode : no
[X] El. density : 0.46037E+24 [cm-3]
[X] Fermi Level : 5.110835 [eV]

[X] === Gaps and Widths ===
[X] Conduction Band Min : 3.877976 [eV]
[X] Valence Band Max : 0.000000 [eV]
[X] Filled Bands : 8
[X] Empty Bands : 9 100
[X] Direct Gap : 4.289853 [eV]
[X] Direct Gap localized at k-point : 7
[X] Indirect Gap : 3.877976 [eV]
[X] Indirect Gap between k-points : 14 7
[X] Last valence band width : 3.401086 [eV]
[X] 1st conduction band width : 4.266292 [eV]

Yambo recalculates again the Fermi level (close to the value of 5.06 noted in the PWscf SCF calculation). From here on, however, the Fermi level is set to zero, and other eigenvalues are shifted accordingly. The system is insulating (8 filled, 92 empty) with an indirect band gap of 3.87 eV. The direct and indirect gaps are indicated. There are 72 k-points in the full BZ, generated using symmetry from the 14 k-points in our user-defined grid.

'''TIP''': You should inspect the report file after ''every'' run for errors and warnings.

===Different ways of running yambo===
We just run Yambo interactively.

Let's try to re-run the setup with the command
$ nohup yambo &
$ ls
l_setup nohup.out r_setup r_setup_01 SAVE

If Yambo is launched using a script, or as a background process, or in parallel, this output will appear in a log file prefixed by the letter ''l'', in this case as ''l_setup''.
If this log file already exists from a previous run, it will not be overwritten. Instead, a new file will be created with an incrementing numerical label, e.g. ''l_setup_01, l_setup_02'', etc. '''This applies to all files created by Yambo'''. Here we see that l_setup was created for the first time, but r_setup already existed from the previous run, so now we have r_setup_01
If you check the differences between the two you will notice that in the second run yambo is reading the previously created ndb.kindx in place of re-computing the indexes.
Indeed the output inside l_setup does not show the timing for X and Sigma

As a last step we run the setup in parallel, but first we delete the ndb.kindx file
$ rm SAVE/ndb.kindx
$ mpirun -np 4 yambo
$ ls
LOG l_setup nohup.out r_setup r_setup_01 r_setup_02 SAVE
There is now r_setup_02
In the case of parallel runs, CPU-dependent log files will appear inside a ''LOG'' folder, e.g.
$ ls LOG
l_setup_CPU_1 l_setup_CPU_2 l_setup_CPU_3 l_setup_CPU_4
This behaviour can be controlled at runtime - see the Parallel tutorial for details.

===2D hBN===
Simply repeat the steps above. Go to the folder ''containing'' the hBN-sheet ''SAVE'' directory and launch <code>yambo</code>:
$ cd TUTORIALS/hBN-2D/YAMBO
$ ls
SAVE
$ yambo
Again, inspect the ''r_setup'' file, output logs, and verify that ''ndb.gops'' and ''ndb.kpts'' have been created inside the SAVE folder.

You are now ready to use Yambo!

==Yambo's command line interface==
Yambo uses a command line interface to select tasks, generate input files, and control the runtime behaviour.

In this module you will learn how to select tasks, generate and modify input files, and control the runtime behaviour by using Yambo's command line interface.


=== Input file generator ===
We are going to work again with bulk hBN.
First, move to the appropriate folder and initialize the Yambo databases if you haven't already done so.
$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo ''(initialize)''

Yambo generates its own input files: you just tell the code what you want to calculate by launching Yambo along with one or more options.
To see the list of possible options, run <code>yambo -h</code> (we report here only the part we are focusing in)
$ yambo -h
'A shiny pot of fun and happiness [C.D.Hogan]'

This is : yambo
Version : 5.3.0 Revision 19547 Hash e90d90f2d
Configuration: MPI+OpenMP+SLK+SLEPC+HDF5_MPI_IO

...

Initializations:
-setup (-i) :Initialization
-coulomb (-r) :Coulomb potentialù
-rw (-w) :Screened coulomb potential

Response Functions:
-optics (-o) <string> :Linear Response optical properties (more with -h optics)
-X (-d) <string> :Inverse Dielectric Matrix (more with -h X)
-dipoles (-q) :Oscillator strenghts (or dipoles)
-kernel (-k) <string> :Kernel (more with -h kernel)

Self-Energy:
-hf (-x) :Hartree-Fock
-gw0 (-p) <string> :GW approximation (more with -h gw0)
-dyson (-g) <string> :Dyson Equation solver (more with -h dyson)
-lifetimes (-l) :GoWo Quasiparticle lifetimes

Bethe-Salpeter Equation:
-Ksolver (-y) <string> :BSE solver (more with -h Ksolver)

Total Energy:
-acfdt :ACFDT Total Energy

Utilites:
...
-slktest :ScaLapacK test

The options can be split into two sets: 
* A set of options which is needed to generate the appropriate input file (default name: ''yambo.in'') selecting the kind of simulation you would like to perform 
* A set of options which can be used to manage auxiliary functions (like redirect the I/O, choose a specific name for the input file, etc ..).

===Runlevel selection===
First of all, you would like to specify which kind of simulation you are going to perform and generate an input file with the first set of options.
By default, when generating the input file, Yambo will launch the <code>vi</code> editor.
Editor choice can be changed when launching the configure before compilation; alternatively you can use the <code>-Q</code> run time option to skip the automatic editing (do this if you are not familiar with <code>vi</code>!):
$ yambo -hf -Q
yambo: input file yambo.in created
$ emacs yambo.in ''or your favourite editing tool''

Multiple options can be used together to activate various tasks or runlevels (in some cases this is actually a necessity).
For instance, to generate an input file for optical spectra including local field effects (Hartree approximation), do (and then exit)
$ yambo -optics c -kernel hartree ''which switches on:''
optics # [R] Linear Response optical properties
chi # [R][CHI] Dyson equation for Chi.
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
To perform a Hartree-Fock and GW calculation using a plasmon-pole approximation, do (and then exit):
$ yambo -hf -gw0 p -dyson n ''which switches on:''
HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
gw0 # [R GW] GoWo Quasiparticle energy levels
ppa # [R Xp] Plasmon Pole Approximation
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
Each runlevel activates its own list of variables and flags.

The previous command is also equivalent to
$ yambo -hf -gw0 r -dyson n -X p

===Changing input parameters ===
Yambo reads various parameters from existing database files and/or input files and uses them to suggest values or ranges.
Let's illustrate this by generating the input file for a Hartree-Fock calculation.

$ yambo -hf
Inside the generated input file you should find:
[[Variables#EXXRLvcs|EXXRLvcs]] = 3187 RL # [XX] Exchange RL components
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
1| 14| 6|10|
%
The <code>[[Variables#QPkrange|QPkrange]]</code> variable (follow the link for a "detailed" explanation for any variable) suggests a range of k-points (1 to 14) and bands (1 to 100) based on what it finds in the core database ''SAVE/ns.db1'', i.e. as defined by the DFT code. 
Leave that variable alone, and instead modify the previous variable to <code>EXXRLvcs= 1000 RL</code>

Save the file, and now generate the input a second time with <code>yambo -x</code>. You will see:
[[Variables#EXXRLvcs|EXXRLvcs]]= 1009 RL
This indicates that Yambo has read the new input value (1000 G-vectors), checked the database of G-vector shells ''(SAVE/ndb.gops)'',
and changed the input value to one that fits a completely closed shell.

Last, note that Yambo variables can be expressed in different '''units'''. In this case, <code>RL</code> can be replaced by an energy unit like Ry, eV, Ha, etc. Energy units are generally better as they are independent of the cell size. Technical information is available on the [[Variables]] page.

The input file generator of Yambo is thus an ''intelligent'' parser, which interacts with the user and the existing databases. For this reason we recommend that you always use Yambo to generate the input files, rather than making them yourself.

===Extra options===
Extra options modify some of the code's default settings. They can be used when launching the code but also when generating input files.

Let's have a look again to the possible options (we report here only the part we are focusing in):
$ yambo -h
This is : yambo
Version : 5.0.1 Revision 19547 Hash e90d90f2d
Configuration: MPI+OpenMP+SLK+SLEPC+HDF5_MPI_IO

Help & version:
-help (-h) <string> :<string> can be an option (e.g. -h optics)
-version :Code version & libraries

Input file & Directories:
-Input (-F) <string> :Input file
-Verbosity (-V) <string> :Input file variables verbosity (more with -h Verbosity)
-Job (-J) <string> :Job string
-Idir (-I) <string> :Input directory
-Odir (-O) <string> :I/O directory
-Cdir (-C) <string> :Communication directory

Parallel Control:
-parenv (-E) <string> :Environment Parallel Variables file
-nompi :Switch off MPI support
-noopenmp :Switch off OPENMP support

...

Utilites:
-Quiet (-Q) :Quiet input file creation
-fatlog :Verbose (fatter) log(s)
-DBlist (-D) :Databases properties
...

Command line options are extremely important to master if you want to use yambo productively.
Often, the meaning is clear from the help menu:
$ yambo -F yambo.in_HF -hf ''Make a Hartree-Fock input file called yambo.in_HF''
$ yambo -D ''Summarize the content of the databases in the SAVE folder''
$ yambo -I ../ ''Run the code, using a SAVE folder in a directory one level up''
$ yambo -C MyTest ''Run the code, putting all report, log, plot files inside a folder MyTest''

Other options deserve a closer look.

===Verbosity===
Yambo uses ''many'' input variables, many of which can be left at their default values. To keep input files short and manageable, only a few variables appear by default in the inout file. More advanced variables can be switched on by using the <code>-V</code> verbosity option. These are grouped according to the type of variable. For instance, <code>-V RL</code> switches on variables related to G vector summations, and <code>-V io</code> switches on options related to I/O control. Try:

$ yambo -optics c -V RL ''switches on:''
FFTGvecs= 3951 RL # [FFT] Plane-waves

$ yambo -optics c -V io ''switches on:''
StdoHash= 40 # [IO] Live-timing Hashes
DBsIOoff= "none" # [IO] Space-separated list of DB with NO I/O. DB= ...
DBsFRAGpm= "none" # [IO] Space-separated list of +DB to be FRAG and ...
#WFbuffIO # [IO] Wave-functions buffered I/O

Unfortunately, -V options must be invoked and changed ''one at a time''. When you are more expert, you may go straight to <code>-V all</code>, which turns on all possible variables. However note that <code>yambo -o c -V all</code> adds an extra 30 variables to the input file, which can be confusing: use it with care.

===Job script label===
The best way to keep track of different runs using different parameters is through the <code>-J</code> flag. This inserts a label in all output and report files, and creates a new folder containing any new databases (i.e. they are not written in the core ''SAVE'' folder). Try:
$ yambo -V RL -hf -F yambo_hf.in ''and modify to''
FFTGvecs = 1 Ry
EXXRLvcs = 1 Ry
VXCRLvcs = 1 Ry
$ yambo -J 1Ry -F yambo_hf.in ''Run the code''
$ ls
yambo_hf.in SAVE
o-1Ry.hf r-1Ry_HF_and_locXC 1Ry 1Ry/ndb.HF_and_locXC
This is extremely useful when running convergence tests, trying out different parameters, etc.

''Exercise'': use <code>yambo</code> to report the properties of all database files (including ''ndb.HF_and_locXC'')

==Links==
* Back to [[ICTP 2022#Tutorials]]
* Back to [[CECAM VIRTUAL 2021#Tutorials]]

GW on h-BN (standalone)

2025-05-18T13:49:50Z

Giacomo.sesti: /* Converging Screening Parameters */

This is a modified version of the full tutorial on GW computations present on the Yambo wiki. Later, you can have a look at the extended version at [[How to obtain the quasi-particle band structure of a bulk material: h-BN]]

In this tutorial you will learn how to:
* Calculate quasi-particle corrections in the Hartree-Fock approximation
* Calculate quasi-particle corrections in the GW approximation
* Choose the input parameters for a meaningful converged calculation
* Plot a band structure including quasi-particle corrections
We will use bulk hBN as an example system. Before starting, you need to obtain the appropriate tarball. See instructions on the [[Tutorials|main tutorials page]]. 
We strongly recommend that you first complete the [[First steps: a walk through from DFT to optical properties]] tutorial.



The aim of the present tutorial is to obtain quasiparticle correction to energy levels using many-body perturbation theory (MBPT). 
The general non-linear quasiparticle equation reads:
[[File:Eqp_1.png|none|x26px|caption]]
As a first step we want to evaluate the self energy Σ entering in the quasiparticle equation. In the GW approach the self-energy can be separated into two components: a static term called the exchange self-energy (Σx), and a dynamical term (energy dependent) called the correlation self-energy (Σc):
[[File:Sigma.png|none|x25px|caption]]
We will treat these two terms separately and demonstrate how to set the most important variables for calculating each term.
In practice we will compute the quasi-particle corrections to the one particle Kohn-Sham eigenvalues obtained through a DFT calculation.

The steps are the following:

==Step 1: The Exchange Self Energy or HF quasi-particle correction==

We start by evaluating the exchange Self-Energy and the corresponding Quasiparticle energies (Hartree-Fock energies).
Follow the module on '''[[Hartree Fock]]''' and then return to this tutorial ''How to obtain the quasiparticle band structure of a bulk material: h-BN''.

==Step 2: The Correlation Self-Energy and Quasiparticle Energies==
Once we have calculated the exchange part, we next turn our attention to the more demanding dynamical part. The correlation part of the self-energy in a plane wave representation reads:
[[File:Sigma_c.png|none|x50px|caption]]
In the expression for the correlation self energy, we have (1) a summation over bands, (2) an integral over the Brillouin Zone, and (3) a sum over the G vectors. In contrast with the case of Σx, the summation over bands extends over ''all'' bands (including the unoccupied ones), and so convergence tests are needed. Another important difference is that the Coulomb interaction is now screened so a fundamental ingredient is the evaluation of the dynamical dielectric matrix. The expression for the dielectric matrix, calculated at the RPA level and including local field effects, has been already treated in the [[Local fields|Local fields]] tutorial.

In the following, we will see two ways to take into account the dynamical effects. First, we will see how to set the proper parameters to obtain a model dielectric function based on a widely used approximation, which models the energy dependence of each component of the dielectric matrix with a single pole function.
Secondly, we will see how to perform calculations by evaluating the dielectric matrix on a regular grid of frequencies.

Once the correlation part of the self-energy is calculated, we will check the convergence of the different parameters with respect to some final quantity, such as the gap.

After computing the frequency dependent self-energy, we will discover that in order to solve the quasiparticle equation we will need to know its value ''at the value of the quasiparticle itself''. In the following, unless explicitly stated, we will solve the non-linear quasi-particle equation at first order, by expanding the self-energy around the Kohn-Sham eigenvalue. In this way the quasiparticle equation reads:

[[File:Eqp_2.png|none|x26px|caption]]

where the normalization factor Z is defined as:

[[File:z_fac.png|none|x40px|caption]]

===The Plasmon Pole approximation===
As stated above, the basic idea of the plasmon-pole approximation is to approximate the frequency dependence of the dielectric matrix with a single pole function of the form:
[[File:ppa.png|none|x26px|caption]]
The two parameters RGG' and ΩGG' are obtained by a fit (for each component), after having calculated the RPA dielectric matrix at two given frequencies.
Yambo calculates the dielectric matrix in the static limit ( ω=0) and at a user defined frequency called the plasmon-pole frequency (ω=iωp).
Such an approximation has the big computational advantage of calculating the dielectric matrix for only two frequencies and leads to an analytical expression for the frequency integral of the correlation self-energy.
==== Input file generation ====
Let's start by building up the input file for a GW/PPA calculation, including the calculation of the exchange self-energy. From <code>yambo -H</code> you should understand that the correct option is <code>yambo -x -p p -g n</code>:

$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo -x -p p -g n -F gw_ppa.in

Let's modify the input file in the following way:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] GoWo Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
[[Variables#EXXRLvcs|EXXRLvcs]] = 40 Ry # [XX] Exchange RL components
[[Variables#VXCRLvcs|VXCRLvcs]] = 3187 RL # [XC] XCpotential RL components
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% BndsRnXp
1 | 10 | # [Xp] Polarization function bands
%
[[Variables#NGsBlkXp|NGsBlkXp]]= 1000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
[[Variables#PPAPntXp|PPAPntXp]] = 27.21138 eV # [Xp] PPA imaginary energy
%[[Variables#GbndRnge|GbndRnge]]
1 | 40 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
7| 7| 8| 9|
%

Brief explanation of some settings:
* Similar to the Hartree Fock study, we will concentrate on the convergence of the '''direct''' gap of the system. Hence we select the last occupied (8) and first unoccupied (9) bands for k-point number 7 in the <code>[[Variables#QPkrange|QPkrange]]</code> variable.
* We also keep <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> at its converged value of 40 Ry as obtained in the '''[[Hartree Fock]]''' tutorial.
* For the moment we keep fixed the plasmon pole energy <code>[[Variables#PPAPntXp|PPAPntXp]]</code> at its default value (=1 Hartree).
* We keep fixed the direction of the electric field for the evaluation of the dielectric matrix to a non-specific value: <code>[[Variables#LongDrXp|LongDrXp]]</code>=(1,1,1).
* Later we will study convergence with respect to <code>[[Variables#GbndRnge|GbndRnge]]</code>, <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>, and <code>[[Variables#BndsRnXp|BndsRnXp]]</code>; for now just set them to the values indicated.

==== Understanding the output ====
Let's look at the typical Yambo output. Run Yambo with an appropriate <code>-J</code> flag:

$ yambo -F gw_ppa.in -J 10b_1Ry

In the standard output you can recognise the different steps of the calculations: calculation of the screening matrix (evaluation of the non interacting and interacting response), calculation of the exchange self-energy, and finally the calculation of the correlation self-energy and quasiparticle energies. Moreover information on memory usage and execution time are reported:
...
<---> [05] Dynamic Dielectric Matrix (PPA)
...
<08s> Xo@q[3] |########################################| [100%] 03s(E) 03s(X)
<08s> X@q[3] |########################################| [100%] --(E) --(X)
...
<43s> [06] Bare local and non-local Exchange-Correlation
<43s> EXS |########################################| [100%] --(E) --(X)
...
<43s> [07] Dyson equation: Newton solver
<43s> [07.01] G0W0 (W PPA)
...
<45s> G0W0 (W PPA) |########################################| [100%] --(E) --(X)
...
<45s> [07.02] QP properties and I/O
<45s> [08] Game Over & Game summary

Let's have a look at the report and output. The output file ''o-10b_1Ry.qp'' contains (for each band and k-point that we indicated in the input file) the values of the bare KS eigenvalue, its GW correction and the correlation part of the self energy:
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.411876 -0.567723 2.322443
7 9 3.877976 2.413773 -2.232241

In the header you can see the details of the calculations, for instance it reports that <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=1 Ry corresponds to 5 Gvectors:

# X G`s [used]: 5

Other information can be found in the report file ''r-10b_1Ry_em1d_ppa_HF_and_locXC_gw0'', such as the renormalization factor defined above, the value of the ''exchange'' self-energy (non-local XC) and of the DFT exchange-correlation potential (local XC):

[07.02] QP properties and I/O
=============================
Legend (energies in eV):
- B : Band - Eo : bare energy
- E : QP energy - Z : Renormalization factor
- So : Sc(Eo) - S : Sc(E)
- dSp: Sc derivative precision
- lXC: Starting Local XC (DFT)
-nlXC: Starting non-Local XC (HF)
QP [eV] @ K [7] (iku): 0.000000 -0.500000 0.000000
B=8 Eo= -0.41 E= -0.98 E-Eo= -0.56 Re(Z)=0.81 Im(Z)=-.2368E-2 nlXC=-19.13 lXC=-16.11 So= 2.322
B=9 Eo= 3.88 E= 6.29 E-Eo= 2.41 Re(Z)=0.83 Im(Z)=-.2016E-2 nlXC=-5.536 lXC=-10.67 So=-2.232

Extended information can be also found in the output activating in the input the <code>[[Variables#ExtendOut|ExtendOut]]</code> flag. This is an optional flag that can be activated by adding the <code>-V qp</code> verbosity option when building the input file. The plasmon pole screening, exchange self-energy and the quasiparticle energies are also saved in databases so that they can be reused in further runs:

$ ls ./10b_1Ry
ndb.pp ndb.pp_fragment_1 ... ndb.HF_and_locXC ndb.QP

===Convergence tests for a quasi particle calculation===

Now we can check the convergence of the different variables entering in the expression of the correlation part of the self energy. 
First, we focus on the parameter governing the ''screening matrix'' you have already seen in the [[RPA/IP]] section. In contrast to the calculation of the [[RPA/IP]] dielectric function, where you considered either the optical limit or a finite q response (EELS), here the dielectric matrix will be calculated for ''all'' q-points determined by the choice of k-point sampling.

The parameters that need to be converged can be understood by looking at expression of the dielectric matrix:
[[File:Yambo-CH5.png|none|x30px|Yambo tutorial image]]
where ''χGG''' is given by
[[File:Dyson_rpa.png|none|x40px|Yambo tutorial image]]
and χ0GG' is given by
[[File:ChiO.png|none|x45px|Yambo tutorial image]]
* <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> : The dimension of the microscopic inverse matrix, related to [[Local fields]]
* <code>[[Variables#BndsRnXp|BndsRnXp]]</code> : The sum on bands (c,v) in the independent particle χ0GG'

====Converging Screening Parameters====
Here we will check the convergence of the gap starting from the variables controlling the screening reported above: the bands employed to build the RPA response function <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and the number of blocks (G,G') of the dielectric matrix ε-1G,G' <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
In the next section we will study convergence with respect to the sum over states summation (sum over ''m'' in the Σc expression); here let's fix <code>[[Variables#GbndRnge|GbndRnge]]</code> to a reasonable value (40 Ry).

Let's build a series of input files differing by the values of bands and block sizes in χ0GG' considering <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in the range 10-50 (upper limit) and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> in the range 1000 to 5000 mRy. To do this by hand, file by file, open the ''gw_ppa.in'' file in an editor and change to:
[[Variables#NGsBlkXp|NGsBlkXp]] = '''2000 mRy'''
while leaving the rest untouched. Repeat for 3000 mRy, 4000 mRy etc. Next, for each <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> change to:
% [[Variables#BndsRnXp|BndsRnXp]]
1 | 20 | # [Xp] Polarization function bands
%
and repeat for 30, 40 and so on. Give a '''different name''' to each file: ''gw_ppa_Xb_YRy.in'' with X=10,20,30,40 and Y=1000,2000,3000,4000,5000 mRy.

This is obviously quite tedious. However, you can automate both the input construction and code execution using bash or python scripts (indeed later you will learn how to use the the yambo-python [http://www.yambo-code.org/wiki/index.php?title=GW_tutorial._Convergence_and_approximations_(BN)]tool for this task). For now, you can use the simple [[bash_scripts|generate_inputs_1.sh]] bash script to generate the required input files (after copying the script you need to type <code>$ chmod +x name_of_the_script.sh </code>).

Finally launch the calculations:

$ yambo -F gw_ppa_10b_1Ry.in -J 10b_1Ry
$ yambo -F gw_ppa_10b_2Ry.in -J 10b_2Ry
...
$ yambo -F gw_ppa_40b_5Ry.in -J 40b_5Ry

Once the jobs are terminated we can collect the quasiparticle energies for fixed <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in different files named e.g. ''10b.dat, 20b.dat'' etc. for plotting, by putting in separate columns: the energy cutoff; the size of the G blocks; the quasiparticle energy of the valence band; and that of the conduction band.
To do this e.g. for the ''10b.dat'' file you can type:
$ cat o-10b* | grep "7 * 8" | awk '{print $3+$4}'
to parse the valence band quasiparticle energies and
$ cat o-10b* | grep "7 * 9" | awk '{print $3+$4}'
for the conduction band; and put all the data in the ''10b.dat'' files. As there are many files to process you can use the [[bash_scripts|parse_qps.sh]] script to create the ''10b.dat'' file and edit the script changing the number of <code>[[Variables#BndsRnXp|BndsRnXp]]</code> for the other output files.

Once we have collected all the quasiparticle values we can plot the gap, or the valence and conduction band energies separately, as a function of the block size or energy cutoff:
$ gnuplot
gnuplot> p "10b.dat" u 1:3 w lp t " Valence BndsRnXp=10", "20b.dat" u 1:3 w lp t "Valence BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:4 w lp t " Conduction BndsRnXp=10", "20b.dat" u 1:4 w lp t "Conduction BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:($4-$3) w lp t " Gap BndsRnXp=10", "20b.dat" u 1:($4-$3) w lp t "gap BndsRnXp=20",..
or both using e.g. the [[gnuplot_scripts|ppa_gap.gnu]] gnuplot script:

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect screening parameters">
File:ppa2.png|Valence band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
File:ppa3.png|Conduction band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
</gallery>

Looking at the plot we can see that:
* The two parameters are not totally independent (see e.g. the valence quasiparticle convergence) and this is the reason why we converged them simultaneously
* The gap (energy difference) converge faster than single quasiparticle state
* The convergence criteria depends on the degree of accuracy we look for, but considering the approximations behind the calculations (plasmon-pole etc.), it is not always a good idea to enforce too strict a criteria.
* Even if not totally converged we can consider that the upper limit of <code>[[Variables#BndsRnXp|BndsRnXp]]</code>=30, and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=3Ry are reasonable parameters.

Finally notice that in the current version of Yambo (from 4.5 etc.) there is a terminator for the dielectric constant that accelerates convergence on the number of bands,
you can active it setting <code>XTermKind= "BG"</code>. You can compare the converge of the dielectric constant with and without terminator,
similar to what happens for the self-energy, see next section.

====Converging the sum over states in the correlation self-energy====
From now on we will keep fixed these parameters and will perform a convergence study on the sum over state summation in the correlation self-energy (Σc) <code>[[Variables#GbndRnge|GbndRnge]]</code>.

In order to use the screening previously calculated we can copy the plasmon pole parameters saved in the <code>30b_3Ry</code> directory in the <code>SAVE</code> directory. In this way the screening will be read by Yambo and not calculated again:

$ cp ./30b_3Ry/ndb.pp* ./SAVE/.

(Note: you may have to delete these files before running the BSE tutorials)

In order to use the databases we have to be sure to have the same plasmon-pole parameters in our input files.
Edit ''gw_ppa_30b_3Ry.in'' and modify <code>[[Variables#GbndRnge|GbndRnge]]</code> to order to have a number of bands in the range from 10 to 80 inside different files named ''gw_ppa_Gbnd10.in'', ''gw_ppa_Gbnd20.in'' etc. You can also run the the [[bash_scripts|generate_inputs_2.sh]] bash script to generate the required input files.

Next, launch yambo for each input:

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20
...

and as done before we can inspect the obtained quasiparticle energies:

$ grep "7 * 8" o-Gbnd* | awk '{print $4+$5}'
for the valence bands, and
$ grep "7 * 9" o-Gbnd* | awk '{print $4+$5}'
for the conduction band.

Collect the results in a text file ''Gbnd_conv.dat'' containing: the number of Gbnd, the valence energy, and the conduction energy.
Now, as done before we can plot the valence and conduction quasiparticle levels separately as well as the gap, as a function of the number of bands used in the summation:

$ gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp lt 7 t "Valence"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp lt 7 t "Conduction"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp lt 7 t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect sum over states in correlation self-energy">
File:Gbnd_val.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>.
File:Gbnd_cond.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
File:Gbnd_gap.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
</gallery>

Inspecting the plot we can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is rather slow and many bands are needed to get converged results.
* As observed above the gap (energy difference) converges faster than the single quasiparticle state energies.

====Accelerating the sum over states convergence in the correlation self-energy====
In general the convergence with respect the sum over states can be very cumbersome. Here we show how it can be mitigated by using
a technique developed by F. Bruneval and X. Gonze <ref name="BG"> F. Bruneval and X. Gonze, Physical Review B 78, 085125 (2008 )</ref>. The basic idea relies in replacing the eigenenergies of the states that are not treated explicitly by a common energy, and take into account all the states, which are not explicitly included in the calculation through the closure relation.
To apply this technique in Yambo we need to activate the optional terminator variable [[Variables#GTermKind|GTermKind]]. You can edit the input file by setting:

[[Variables#GTermKind|GTermKind]]= "BG"

for example in the file ''gw_ppa_Gbnd10.in''. Now, repeat this operation also in all the other ''gw_ppa_GbndX.in'' input files.
We can now perform once again the same calculations with the terminator activated

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10_term
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20_term
...

and collect the new results:

$ grep "7 * 8" *term.qp | awk '{print $4+$5}'
$ grep "7 * 9" *term.qp | awk '{print $4+$5}'

in a new file called ''Gbnd_conv_terminator.dat''. Now we can plot the same quantities as before by looking at the effect of having introduced the terminator.

gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp t "Valence", "Gbnd_conv_terminator.dat" u 1:2 w lp t "Valence with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp t "Conduction", "Gbnd_conv_terminator.dat" u 1:3 w lp t "Conduction with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp t "Gap", "Gbnd_conv_terminator.dat" u 1:($3-$2) w lp t "Gap with terminator"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Effect of the terminator in convergence of QP energies with respect sum over states">
File:val_t.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique <ref name="BG" />
File:cond_t.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
File:gap_t.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
</gallery>

We can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is accelerated in particular for the single quasiparticle states.
* From the plot above we can see that <code>[[Variables#GbndRange|GbndRange]]</code>=40 is sufficient to have a converged gap

===Beyond the plasmon pole approximation: a full frequency approach - real axis integration===
All the calculations performed up to now were based on the plasmon pole approximation (PPA). Now we remove this approximation by evaluating numerically the frequency integral in the expression of the correlation self-energy (Σc). To this aim we need to evaluate the screening for a number of frequencies in the interval depending on the electron-hole energy difference (energy difference between empty and occupied states) entering in the sum over states.
Let's build the input file for a full frequency calculation by simply typing:

$ yambo -F gw_ff.in -g n -p r

and we can set the variable studied up to now at their converged value:

%[[Variables#BndsRnXd|BndsRnXd]]
1 | 30 |
%
[[Variables#NGsBlkXd|NGsBlkXd]]=3000 mRy
%[[Variables#GbndRange|GbndRange]]
1 | 40 |
%
[[Variables#EXXRLvcs|EXXRLvcs]]=40000 mRy
% [[Variables#LongDrXd|LongDrXd]]
1.000000 | 1.000000 | 1.000000 | # [Xd] [cc] Electric Field
%
%QPkrange # # [GW] QP generalized Kpoint/Band indices
7|7|8|9|
%

and next vary in different files (name them ''gw_ff10.in'' etc.) the number of frequencies we evaluate for the screened coulomb potential. This is done by setting the values of [[Variables#ETStpsXd|ETStpsXd]]=10 , 50 , 100, 150, 200, 250.
You can also run the the [[bash_scripts|generate_inputs_3.sh]] bash script to generate the required input files.

Next launch yambo:
$ yambo -F gw_ff10.in -J ff10
$ yambo -F gw_ff50.in -J ff50
...

Clearly as we are evaluating the screening for a large number of frequencies these calculations will be heavier than the case above where the calculation of the screening was done for only two frequencies (zero and plasmon-pole).
As before, collect the valence and conduction bands as a function of the number of frequencies in a file called ''gw_ff.dat'' and plot the behaviour of the conduction and valence bands and the gap.

$ gnuplot
gnuplot> p "gw_ff.dat" u 1:2 w lp t "Valence"
gnuplot> p "gw_ff.dat" u 1:3 w lp t "Conduction"
gnuplot> p "gw_ff.dat" u 1:($3-$2) w lp t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Real axis integration, convergences with respect the used number of frequencies">
File:ff_v.png|Valence band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_c.png|Conduction band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_g.png|Gap energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
</gallery>

We can see that:

* Oscillations are still present, indicating that even more frequencies have to be considered. In general, a real-axis calculation is very demanding.
* The final result of the gap obtained up to now does not differ too much from the one obtained at the plasmon-pole level (~50 meV)

As the real-axis calculation is computational demanding, in the last years it has been developed in Yambo an alternative method called Multipole approximation (MPA). The MPA can be seen as an extension of the PPA, where you take a finite number of poles instead of just one. It has been seen that even a small number of poles grants a MPA on par with a full real axis calculation. For your reference, at [[Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)|https://www.yambo-code.eu/wiki/index.php/Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)]] there is a tutorial on the MPA method. For now, however, let us pass over to the next section.

===Secant Solver===

The real axis integration permits also to go beyond the linear expansion to solve the quasi particle equation.
The QP equation is a non-linear equation whose solution must be found using a suitable numerical algorithm.
[[File:Eqp_sec.png|none|x22px|caption]]
The mostly used, based on the linearization of the self-energy operator is the Newton method that is the one we have used up to now.
Yambo can also perform a search of the QP energies using a non-linear iterative method based on the [https://en.wikipedia.org/wiki/Secant_method Secant iterative Method].

In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite difference approximation of Newton's method.
The equation that defines the secant method is:

[[File:secant_eq.png|none|x35px|caption]]

The first two iterations of the secant method are shown in the following picture. The red curve shows the function f and the blue lines are the secants.

[[File:Secant_method.png|center|250px|caption]]

To see if there is any non-linear effect in the solution of the Dyson equation we compare the result of the calculation using the Newton solver as done before with the present case.
In order to use the secant method you need to edit one of the the previous ''gw_ffN.in'' files e.g. ''gw_ff100.in'' and substitute:

DysSolver= "g"
to
DysSolver= "s"

Repeat the calculations in the same way as before:

$ yambo -F gw_ff100.in -J ff100

Note than now the screening will ''not'' be calculated again as it has been stored in the ''ffN'' directories as can be seen in the report file:

[05] Dynamical Dielectric Matrix
================================
[RD./ff10//ndb.em1d]----
...
[RD./ff10//ndb.em1d_fragment_1]--------------
...

Comparing the output files, e.g. for the case with 100 freqs:

''o-ff100.qp'' '''Newton Solver:'''
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08708 2.91254
7 9 3.877976 1.421968 -3.417357

''o-ff100.qp_01'' '''Secant Solver:'''
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08715 2.93518
7 9 3.877976 1.401408 -3.731649

From the comparison, we see that the effect is of the order of 20 meV, of the same order of magnitude of the accuracy of the GW calculations.

==Step 3: Interpolating Band Structures==
Up to now we have checked convergence for the gap. Now we want to calculate the quasiparticle corrections across the Brillouin zone in order to visualize the entire band structure along a path connecting high symmetry points.

To do that we start by calculating the QP correction in the plasmon-pole approximation for all the k points of our sampling and for a number of bands around the gap. You can use a previous input file or generate a new one: <code> yambo -F gw_ppa_all_Bz.in -x -p p -g n </code> and set the parameters found in the previous tests:

EXXRLvcs= 40 Ry
% BndsRnXp
1 | 30 | # [Xp] Polarization function bands
%
NGsBlkXp= 3000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 40 | # [GW] G[W] bands range
%

and we calculate it for all the available kpoints by setting:
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 14| 6|11|
%

Note that as we have already calculated the screening with these parameters you can save time and use that database either by running in the previous directory by using <code> -J 30b_3Ry </code> or if you prefer to put the new databases in the new all_Bz directory you can create a new directory and copy there the screening databases:

$ mkdir all_Bz
$ cp ./30b_3Ry/ndb.pp* ./all_Bz/

and launch the calculation:

$ yambo -F gw_ppa_all_Bz.in -J all_Bz

Now we can inspect the output and see that it contains the correction for all the k points for the bands we asked:

# K-point Band Eo E-Eo Sc|Eo
#
1 6 -1.299712 -0.219100 3.788044
1 7 -1.296430 -0.241496 3.788092
1 8 -1.296420 -0.243115 3.785947
1 9 4.832399 0.952386 -3.679259
1 10 10.76416 2.09915 -4.38743
1 11 11.36167 2.48053 -3.91021
....
By plotting some of the 'o-all_Bz.qp" columns it is possible to discuss some physical properties of the hBN QPs. Using columns 3 and (3+4), ie plotting the GW energies with respect to the LDA energies we can deduce the band gap renormalization and the stretching of the conduction/valence bands:

$ gnuplot
gnuplot> p "o-all_Bz.qp" u 3:($3+$4) w p t "Eqp vs Elda"

[[File:EqpvE0.png|center|350px|caption]]

Essentially we can see that the effect of the GW self-energy is the opening of the gap and a linear stretching of the conduction/valence bands that can be estimated by performing a linear fit of the positive and negative energies (the zero is set at top of the valence band).

In order to calculate the band structure, however, we need to interpolate the values we have calculated above on a given path. In Yambo the interpolation is done by the executable <code>ypp</code> (Yambo Post Processing).
By typing:
$ ypp -h
you will recognize that in order to interpolate the bands we need to build a ypp input file using
$ ypp -s b

we can generate the input for the band interpolation:

$ypp -s b -F ypp_bands.in

and edit the ''ypp_bands.in'' file:

electrons # [R] Electrons (and holes)
bnds # [R] Bands
INTERP_mode= "BOLTZ" # Interpolation mode (NN=nearest point, BOLTZ=boltztrap aproach)
OutputAlat= 4.716000 # [a.u.] Lattice constant used for "alat" ouput format
cooIn= "rlu" # Points coordinates (in) cc/rlu/iku/alat
cooOut= "rlu"
% BANDS_bands
1 | 100 | # Number of bands
%
INTERP_Shell_Fac= 20.00000 # The bigger it is a higher number of shells is used
CIRCUIT_E_DB_path= "none" # SAVE obtained from the QE `bands` run (alternative to %BANDS_kpts)
BANDS_path= "" # BANDS path points labels (G,M,K,L...)
BANDS_steps= 10
#BANDS_built_in # Print the bands of the generating points of the circuit using the nearest internal point
%BANDS_kpts
%

We modify the following lines:
BANDS_steps=30
% BANDS_bands
6 | 11 | # Number of bands
%
%BANDS_kpts # K points of the bands circuit
0.33300 |-.66667 |0.00000 |
0.00000 |0.00000 |0.00000 |
0.50000 |-.50000 |0.00000 |
0.33300 |-.66667 |0.00000 |
0.33300 |-.66667 |0.50000 |
0.00000 |0.00000 |0.50000 |
0.50000 |-.50000 |0.50000 |
%

which means we assign 30 points in each segment, we ask to interpolate 3 occupied and 3 empty bands and we assign the following path passing from the high symmetry points: K Γ M K H A L.
Launching:
$ ypp -F ypp_bands.in
will produce the output file ''o.bands_interpolated'' containing:

#
# |k| b6 b7 b8 b9 b10 b11 kx ky kz
#
#
0.00000 -7.22092 -0.13402 -0.13395 4.67691 4.67694 10.08905 0.33300 -0.66667 0.00000
0.03725 -7.18857 -0.17190 -0.12684 4.66126 4.71050 10.12529 0.32190 -0.64445 0.00000
...

and we can plot the bands using gnuplot:
$ gnuplot
gnuplot> p "o.bands_interpolated" u 0:2 w l, "o.bands_interpolated" u 0:3 w l, ...

[[File:bands_lda.png|center|350px|caption]]

and you can recognize the index of the high symmetry point by inspecting the last three columns.
Note that up to now we have interpolated the LDA band structure. In order to plot the GW band structure, we need to tell <code>ypp</code> in the input file where the ''ndb.QP'' database is found. This is achieved by adding in the ''ypp_bands.in'' file the line:

GfnQPdb= "E < ./all_Bz/ndb.QP"

and relaunch

$ ypp -F ypp_bands.in

Now the file ''o.bands_interpolated_01'' contains the GW interpolated band structure. We can plot the LDA and GW band structure together by using the gnuplot script [[gnuplot_scripts|bands.gnu]].

<gallery mode=nolines widths=500px heights=500px perrow=2 caption="Band strcuture of bulk hBN">
File:hBN_bands.png| LDA and GW bands structure
File:hBN_bands_lit.png| LDA and GW bands structure from Ref. <ref name="Arnaud" />
</gallery>

*As expected the effect of the GW correction is to open the gap.
*Comparing the obtained band structure with the one found in the literature by Arnaud and coworkers <ref name="Arnaud"> B. Arnaud, S. Lebegue,P. Rabiller, and M. Alouani Phys, Rev. Lett. 96, 026402 (2006)</ref> we found a very nice qualitative agreement.
*Quantitatively we found a smaller gap: about 5.2 eV (indirect gap), 5.7 eV (direct gap) while in Ref.<ref name="Arnaud" /> is found 5.95 eV for the indirect gap and a minimum direct band gap of 6.47 eV. Other values are also reported in the literature depending on the used pseudopotentials, starting functional and type of self-consistency (see below).
*The present tutorial has been done with a small k point grid which is an important parameter to be checked, so convergence with respect the k point sampling has to be validated.

==Step 4: Summary of the convergence parameters==
We have calculated the band structure of hBN starting from a DFT calculation, here we summarize the main variable we have checked to achieve convergence:

* <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> # [XX] Exchange RL components
Number of G-vectors in the exchange. This number should be checked carefully. Generally a large number is needed as the QP energies show a slow convergence. The calcualtion of the exchange part is rather fast.

*<code>[[Variables#BndsRnXp|BndsRnXp]]</code> #[Xp] Polarization function bands
Number of bands in the independent response function form which the dielectric matrix is calculated. Also this paramater has to be checked carefully,together with NGsBlkXp as the two variables are interconnected

*<code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> # [Xp] Response block size
Number of G-vectors block in the dielectric constant. Also this paramater has to be checked carefully, to be checked together with BndsRnXp. A large number of bands and block can make the calculation very demanding.

*<code>[[Variables#LongDrXp|LongDrXp]] </code> # [Xp] [cc] Electric Field
Direction of the electric field for the calculation of the q=0 component of the dielectric constant e(q,w). In a bulk can be set to (1,1,1), attention must be paid for non 3D systems.

*<code>[[Variables#PPAPntXp|PPAPntXp]] </code> # [Xp] Plasmon pole imaginary energy: this is the second frequency used to fit the Godby-Needs plasmon-pole model (PPM). If results depend consistently by changing this frequency, the PPM is not adequate for your calculation and it is need to gp beyond that, e.g. Real-axis.

*<code>[[Variables#GbndRnge|GbndRnge]] </code> # [GW] G[W] bands range
Number of bands used to expand the Green's function. This number is usually larger than the number of bands used to calculated the dielectricconstant. Single quasiparticle energies converge slowly with respect GbndRnge, energy difference behave better. You can use terminator technique to mitigate the slow dependence.

*<code>[[Variables#GDamping|GDamping]] </code> # [GW] G[W] damping
Small damping in the Green's function definition, the delta
parameter. The final result shouuld not depend on that, usually set at 0.1 eV

*<code>[[Variables#dScStep|dScStep]] </code> # [GW]
Energy step to evaluate Z factors

*<code>[[Variables#DysSolver|DysSolver]] </code> # [GW] Dyson Equation solver ("n","s","g")
Parameters related to the solution of the Dyson equation, "n" Newton linearization, 's' non linear secant method

*<code>[[Variables#GTermKind|GTermKind]] </code> [GW] GW terminator
Terminator for the self-energy<ref name="BG" /> . We have seen how this spped up the convergence with respect empty bands.

*<code>[[Variables#%QPkrange |QPkrange ]] </code> # [GW] QP generalized Kpoint/Band indices
K-points and band range where you want to calculate the GW correction. The syntax is
first kpoint | last kpoint | first band | last band

==References==

GW on h-BN (standalone)

2025-05-18T13:45:48Z

Giacomo.sesti: /* Converging Screening Parameters */

This is a modified version of the full tutorial on GW computations present on the Yambo wiki. Later, you can have a look at the extended version at [[How to obtain the quasi-particle band structure of a bulk material: h-BN]]

In this tutorial you will learn how to:
* Calculate quasi-particle corrections in the Hartree-Fock approximation
* Calculate quasi-particle corrections in the GW approximation
* Choose the input parameters for a meaningful converged calculation
* Plot a band structure including quasi-particle corrections
We will use bulk hBN as an example system. Before starting, you need to obtain the appropriate tarball. See instructions on the [[Tutorials|main tutorials page]]. 
We strongly recommend that you first complete the [[First steps: a walk through from DFT to optical properties]] tutorial.



The aim of the present tutorial is to obtain quasiparticle correction to energy levels using many-body perturbation theory (MBPT). 
The general non-linear quasiparticle equation reads:
[[File:Eqp_1.png|none|x26px|caption]]
As a first step we want to evaluate the self energy Σ entering in the quasiparticle equation. In the GW approach the self-energy can be separated into two components: a static term called the exchange self-energy (Σx), and a dynamical term (energy dependent) called the correlation self-energy (Σc):
[[File:Sigma.png|none|x25px|caption]]
We will treat these two terms separately and demonstrate how to set the most important variables for calculating each term.
In practice we will compute the quasi-particle corrections to the one particle Kohn-Sham eigenvalues obtained through a DFT calculation.

The steps are the following:

==Step 1: The Exchange Self Energy or HF quasi-particle correction==

We start by evaluating the exchange Self-Energy and the corresponding Quasiparticle energies (Hartree-Fock energies).
Follow the module on '''[[Hartree Fock]]''' and then return to this tutorial ''How to obtain the quasiparticle band structure of a bulk material: h-BN''.

==Step 2: The Correlation Self-Energy and Quasiparticle Energies==
Once we have calculated the exchange part, we next turn our attention to the more demanding dynamical part. The correlation part of the self-energy in a plane wave representation reads:
[[File:Sigma_c.png|none|x50px|caption]]
In the expression for the correlation self energy, we have (1) a summation over bands, (2) an integral over the Brillouin Zone, and (3) a sum over the G vectors. In contrast with the case of Σx, the summation over bands extends over ''all'' bands (including the unoccupied ones), and so convergence tests are needed. Another important difference is that the Coulomb interaction is now screened so a fundamental ingredient is the evaluation of the dynamical dielectric matrix. The expression for the dielectric matrix, calculated at the RPA level and including local field effects, has been already treated in the [[Local fields|Local fields]] tutorial.

In the following, we will see two ways to take into account the dynamical effects. First, we will see how to set the proper parameters to obtain a model dielectric function based on a widely used approximation, which models the energy dependence of each component of the dielectric matrix with a single pole function.
Secondly, we will see how to perform calculations by evaluating the dielectric matrix on a regular grid of frequencies.

Once the correlation part of the self-energy is calculated, we will check the convergence of the different parameters with respect to some final quantity, such as the gap.

After computing the frequency dependent self-energy, we will discover that in order to solve the quasiparticle equation we will need to know its value ''at the value of the quasiparticle itself''. In the following, unless explicitly stated, we will solve the non-linear quasi-particle equation at first order, by expanding the self-energy around the Kohn-Sham eigenvalue. In this way the quasiparticle equation reads:

[[File:Eqp_2.png|none|x26px|caption]]

where the normalization factor Z is defined as:

[[File:z_fac.png|none|x40px|caption]]

===The Plasmon Pole approximation===
As stated above, the basic idea of the plasmon-pole approximation is to approximate the frequency dependence of the dielectric matrix with a single pole function of the form:
[[File:ppa.png|none|x26px|caption]]
The two parameters RGG' and ΩGG' are obtained by a fit (for each component), after having calculated the RPA dielectric matrix at two given frequencies.
Yambo calculates the dielectric matrix in the static limit ( ω=0) and at a user defined frequency called the plasmon-pole frequency (ω=iωp).
Such an approximation has the big computational advantage of calculating the dielectric matrix for only two frequencies and leads to an analytical expression for the frequency integral of the correlation self-energy.
==== Input file generation ====
Let's start by building up the input file for a GW/PPA calculation, including the calculation of the exchange self-energy. From <code>yambo -H</code> you should understand that the correct option is <code>yambo -x -p p -g n</code>:

$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo -x -p p -g n -F gw_ppa.in

Let's modify the input file in the following way:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] GoWo Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
[[Variables#EXXRLvcs|EXXRLvcs]] = 40 Ry # [XX] Exchange RL components
[[Variables#VXCRLvcs|VXCRLvcs]] = 3187 RL # [XC] XCpotential RL components
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% BndsRnXp
1 | 10 | # [Xp] Polarization function bands
%
[[Variables#NGsBlkXp|NGsBlkXp]]= 1000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
[[Variables#PPAPntXp|PPAPntXp]] = 27.21138 eV # [Xp] PPA imaginary energy
%[[Variables#GbndRnge|GbndRnge]]
1 | 40 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
7| 7| 8| 9|
%

Brief explanation of some settings:
* Similar to the Hartree Fock study, we will concentrate on the convergence of the '''direct''' gap of the system. Hence we select the last occupied (8) and first unoccupied (9) bands for k-point number 7 in the <code>[[Variables#QPkrange|QPkrange]]</code> variable.
* We also keep <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> at its converged value of 40 Ry as obtained in the '''[[Hartree Fock]]''' tutorial.
* For the moment we keep fixed the plasmon pole energy <code>[[Variables#PPAPntXp|PPAPntXp]]</code> at its default value (=1 Hartree).
* We keep fixed the direction of the electric field for the evaluation of the dielectric matrix to a non-specific value: <code>[[Variables#LongDrXp|LongDrXp]]</code>=(1,1,1).
* Later we will study convergence with respect to <code>[[Variables#GbndRnge|GbndRnge]]</code>, <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>, and <code>[[Variables#BndsRnXp|BndsRnXp]]</code>; for now just set them to the values indicated.

==== Understanding the output ====
Let's look at the typical Yambo output. Run Yambo with an appropriate <code>-J</code> flag:

$ yambo -F gw_ppa.in -J 10b_1Ry

In the standard output you can recognise the different steps of the calculations: calculation of the screening matrix (evaluation of the non interacting and interacting response), calculation of the exchange self-energy, and finally the calculation of the correlation self-energy and quasiparticle energies. Moreover information on memory usage and execution time are reported:
...
<---> [05] Dynamic Dielectric Matrix (PPA)
...
<08s> Xo@q[3] |########################################| [100%] 03s(E) 03s(X)
<08s> X@q[3] |########################################| [100%] --(E) --(X)
...
<43s> [06] Bare local and non-local Exchange-Correlation
<43s> EXS |########################################| [100%] --(E) --(X)
...
<43s> [07] Dyson equation: Newton solver
<43s> [07.01] G0W0 (W PPA)
...
<45s> G0W0 (W PPA) |########################################| [100%] --(E) --(X)
...
<45s> [07.02] QP properties and I/O
<45s> [08] Game Over & Game summary

Let's have a look at the report and output. The output file ''o-10b_1Ry.qp'' contains (for each band and k-point that we indicated in the input file) the values of the bare KS eigenvalue, its GW correction and the correlation part of the self energy:
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.411876 -0.567723 2.322443
7 9 3.877976 2.413773 -2.232241

In the header you can see the details of the calculations, for instance it reports that <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=1 Ry corresponds to 5 Gvectors:

# X G`s [used]: 5

Other information can be found in the report file ''r-10b_1Ry_em1d_ppa_HF_and_locXC_gw0'', such as the renormalization factor defined above, the value of the ''exchange'' self-energy (non-local XC) and of the DFT exchange-correlation potential (local XC):

[07.02] QP properties and I/O
=============================
Legend (energies in eV):
- B : Band - Eo : bare energy
- E : QP energy - Z : Renormalization factor
- So : Sc(Eo) - S : Sc(E)
- dSp: Sc derivative precision
- lXC: Starting Local XC (DFT)
-nlXC: Starting non-Local XC (HF)
QP [eV] @ K [7] (iku): 0.000000 -0.500000 0.000000
B=8 Eo= -0.41 E= -0.98 E-Eo= -0.56 Re(Z)=0.81 Im(Z)=-.2368E-2 nlXC=-19.13 lXC=-16.11 So= 2.322
B=9 Eo= 3.88 E= 6.29 E-Eo= 2.41 Re(Z)=0.83 Im(Z)=-.2016E-2 nlXC=-5.536 lXC=-10.67 So=-2.232

Extended information can be also found in the output activating in the input the <code>[[Variables#ExtendOut|ExtendOut]]</code> flag. This is an optional flag that can be activated by adding the <code>-V qp</code> verbosity option when building the input file. The plasmon pole screening, exchange self-energy and the quasiparticle energies are also saved in databases so that they can be reused in further runs:

$ ls ./10b_1Ry
ndb.pp ndb.pp_fragment_1 ... ndb.HF_and_locXC ndb.QP

===Convergence tests for a quasi particle calculation===

Now we can check the convergence of the different variables entering in the expression of the correlation part of the self energy. 
First, we focus on the parameter governing the ''screening matrix'' you have already seen in the [[RPA/IP]] section. In contrast to the calculation of the [[RPA/IP]] dielectric function, where you considered either the optical limit or a finite q response (EELS), here the dielectric matrix will be calculated for ''all'' q-points determined by the choice of k-point sampling.

The parameters that need to be converged can be understood by looking at expression of the dielectric matrix:
[[File:Yambo-CH5.png|none|x30px|Yambo tutorial image]]
where ''χGG''' is given by
[[File:Dyson_rpa.png|none|x40px|Yambo tutorial image]]
and χ0GG' is given by
[[File:ChiO.png|none|x45px|Yambo tutorial image]]
* <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> : The dimension of the microscopic inverse matrix, related to [[Local fields]]
* <code>[[Variables#BndsRnXp|BndsRnXp]]</code> : The sum on bands (c,v) in the independent particle χ0GG'

====Converging Screening Parameters====
Here we will check the convergence of the gap starting from the variables controlling the screening reported above: the bands employed to build the RPA response function <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and the number of blocks (G,G') of the dielectric matrix ε-1G,G' <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
In the next section we will study convergence with respect to the sum over states summation (sum over ''m'' in the Σc expression); here let's fix <code>[[Variables#GbndRnge|GbndRnge]]</code> to a reasonable value (40 Ry).

Let's build a series of input files differing by the values of bands and block sizes in χ0GG' considering <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in the range 10-50 (upper limit) and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> in the range 1000 to 5000 mRy. To do this by hand, file by file, open the ''gw_ppa.in'' file in an editor and change to:
[[Variables#NGsBlkXp|NGsBlkXp]] = '''2000 mRy'''
while leaving the rest untouched. Repeat for 3000 mRy, 4000 mRy etc. Next, for each <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> change to:
% [[Variables#BndsRnXp|BndsRnXp]]
1 | 20 | # [Xp] Polarization function bands
%
and repeat for 30, 40 and so on. Give a '''different name''' to each file: ''gw_ppa_Xb_YRy.in'' with X=10,20,30,40 and Y=1000,2000,3000,4000,5000 mRy.

This is obviously quite tedious. However, you can automate both the input construction and code execution using bash or python scripts (indeed later you will learn how to use the the yambo-python [http://www.yambo-code.org/wiki/index.php?title=GW_tutorial._Convergence_and_approximations_(BN)]tool for this task). For now, you can use the simple [[bash_scripts|generate_inputs_1.sh]] bash script to generate the required input files (after copying the script you need to type <code>$ chmod +x name_of_the_script.sh </code>).

Finally launch the calculations:

$ yambo -F gw_ppa_10b_1Ry.in -J 10b_1Ry
$ yambo -F gw_ppa_10b_2Ry.in -J 10b_2Ry
...
$ yambo -F gw_ppa_40b_5Ry.in -J 40b_5Ry

Once the jobs are terminated we can collect the quasiparticle energies for fixed <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in different files named e.g. ''10b.dat, 20b.dat'' etc. for plotting, by putting in separate columns: the energy cutoff; the size of the G blocks; the quasiparticle energy of the valence band; and that of the conduction band.
To do this e.g. for the ''10b.dat'' file you can type:
$ cat o-10b* | grep "7 * 8" | awk '{print $3+$4}'
to parse the valence band quasiparticle energies and
$ cat o-10b* | grep "7 * 9" | awk '{print $3+$4}'
for the conduction band; and put all the data in the ''10b.dat'' files. As there are many files to process you can use the [[bash_scripts|parse_qps.sh]] script to create the ''10b.dat'' file and edit the script changing the number of <code>[[Variables#BndsRnXp|BndsRnXp]]</code> for the other output files.

Once we have collected all the quasiparticle values we can plot the gap, or the valence and conduction band energies separately, as a function of the block size or energy cutoff:
$ gnuplot
gnuplot> p "10b.dat" u 1:3 w lp t " Valence BndsRnXp=10", "20b.dat" u 1:3 w lp t "Valence BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:4 w lp t " Conduction BndsRnXp=10", "20b.dat" u 1:4 w lp t "Conduction BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:($4-$3) w lp t " Gap BndsRnXp=10", "20b.dat" u 1:($4-$3) w lp t "gap BndsRnXp=20",..
or both using e.g. the [[gnuplot_scripts|ppa_gap.gnu]] gnuplot script:

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect screening parameters">
File:ppa2.png|Valence band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
File:ppa3.png|Conduction band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
</gallery>

Looking at the plot we can see that:
* The two parameters are not totally independent (see e.g. the valence quasiparticle convergence) and this is the reason why we converged them simultaneously
* The gap (energy difference) converge faster than single quasiparticle state
* The convergence criteria depends on the degree of accuracy we look for, but considering the approximations behind the calculations (plasmon-pole etc.), it is not always a good idea to enforce too strict a criteria.
* Even if not totally converged we can consider that the upper limit of <code>[[Variables#BndsRnXp|BndsRnXp]]</code>=30, and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=3Ry are reasonable parameters.

Finally notice that in the last versions of Yambo (from 4.5 etc.) there is a terminator for the dielectric constant that accelerates convergence on the number of bands,
you can active it setting <code>XTermKind= "BG"</code>. You can compare the converge of the dielectric constant with and without terminator,
similar to what happens for the self-energy, see next section.

====Converging the sum over states in the correlation self-energy====
From now on we will keep fixed these parameters and will perform a convergence study on the sum over state summation in the correlation self-energy (Σc) <code>[[Variables#GbndRnge|GbndRnge]]</code>.

In order to use the screening previously calculated we can copy the plasmon pole parameters saved in the <code>30b_3Ry</code> directory in the <code>SAVE</code> directory. In this way the screening will be read by Yambo and not calculated again:

$ cp ./30b_3Ry/ndb.pp* ./SAVE/.

(Note: you may have to delete these files before running the BSE tutorials)

In order to use the databases we have to be sure to have the same plasmon-pole parameters in our input files.
Edit ''gw_ppa_30b_3Ry.in'' and modify <code>[[Variables#GbndRnge|GbndRnge]]</code> to order to have a number of bands in the range from 10 to 80 inside different files named ''gw_ppa_Gbnd10.in'', ''gw_ppa_Gbnd20.in'' etc. You can also run the the [[bash_scripts|generate_inputs_2.sh]] bash script to generate the required input files.

Next, launch yambo for each input:

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20
...

and as done before we can inspect the obtained quasiparticle energies:

$ grep "7 * 8" o-Gbnd* | awk '{print $4+$5}'
for the valence bands, and
$ grep "7 * 9" o-Gbnd* | awk '{print $4+$5}'
for the conduction band.

Collect the results in a text file ''Gbnd_conv.dat'' containing: the number of Gbnd, the valence energy, and the conduction energy.
Now, as done before we can plot the valence and conduction quasiparticle levels separately as well as the gap, as a function of the number of bands used in the summation:

$ gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp lt 7 t "Valence"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp lt 7 t "Conduction"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp lt 7 t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect sum over states in correlation self-energy">
File:Gbnd_val.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>.
File:Gbnd_cond.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
File:Gbnd_gap.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
</gallery>

Inspecting the plot we can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is rather slow and many bands are needed to get converged results.
* As observed above the gap (energy difference) converges faster than the single quasiparticle state energies.

====Accelerating the sum over states convergence in the correlation self-energy====
In general the convergence with respect the sum over states can be very cumbersome. Here we show how it can be mitigated by using
a technique developed by F. Bruneval and X. Gonze <ref name="BG"> F. Bruneval and X. Gonze, Physical Review B 78, 085125 (2008 )</ref>. The basic idea relies in replacing the eigenenergies of the states that are not treated explicitly by a common energy, and take into account all the states, which are not explicitly included in the calculation through the closure relation.
To apply this technique in Yambo we need to activate the optional terminator variable [[Variables#GTermKind|GTermKind]]. You can edit the input file by setting:

[[Variables#GTermKind|GTermKind]]= "BG"

for example in the file ''gw_ppa_Gbnd10.in''. Now, repeat this operation also in all the other ''gw_ppa_GbndX.in'' input files.
We can now perform once again the same calculations with the terminator activated

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10_term
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20_term
...

and collect the new results:

$ grep "7 * 8" *term.qp | awk '{print $4+$5}'
$ grep "7 * 9" *term.qp | awk '{print $4+$5}'

in a new file called ''Gbnd_conv_terminator.dat''. Now we can plot the same quantities as before by looking at the effect of having introduced the terminator.

gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp t "Valence", "Gbnd_conv_terminator.dat" u 1:2 w lp t "Valence with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp t "Conduction", "Gbnd_conv_terminator.dat" u 1:3 w lp t "Conduction with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp t "Gap", "Gbnd_conv_terminator.dat" u 1:($3-$2) w lp t "Gap with terminator"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Effect of the terminator in convergence of QP energies with respect sum over states">
File:val_t.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique <ref name="BG" />
File:cond_t.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
File:gap_t.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
</gallery>

We can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is accelerated in particular for the single quasiparticle states.
* From the plot above we can see that <code>[[Variables#GbndRange|GbndRange]]</code>=40 is sufficient to have a converged gap

===Beyond the plasmon pole approximation: a full frequency approach - real axis integration===
All the calculations performed up to now were based on the plasmon pole approximation (PPA). Now we remove this approximation by evaluating numerically the frequency integral in the expression of the correlation self-energy (Σc). To this aim we need to evaluate the screening for a number of frequencies in the interval depending on the electron-hole energy difference (energy difference between empty and occupied states) entering in the sum over states.
Let's build the input file for a full frequency calculation by simply typing:

$ yambo -F gw_ff.in -g n -p r

and we can set the variable studied up to now at their converged value:

%[[Variables#BndsRnXd|BndsRnXd]]
1 | 30 |
%
[[Variables#NGsBlkXd|NGsBlkXd]]=3000 mRy
%[[Variables#GbndRange|GbndRange]]
1 | 40 |
%
[[Variables#EXXRLvcs|EXXRLvcs]]=40000 mRy
% [[Variables#LongDrXd|LongDrXd]]
1.000000 | 1.000000 | 1.000000 | # [Xd] [cc] Electric Field
%
%QPkrange # # [GW] QP generalized Kpoint/Band indices
7|7|8|9|
%

and next vary in different files (name them ''gw_ff10.in'' etc.) the number of frequencies we evaluate for the screened coulomb potential. This is done by setting the values of [[Variables#ETStpsXd|ETStpsXd]]=10 , 50 , 100, 150, 200, 250.
You can also run the the [[bash_scripts|generate_inputs_3.sh]] bash script to generate the required input files.

Next launch yambo:
$ yambo -F gw_ff10.in -J ff10
$ yambo -F gw_ff50.in -J ff50
...

Clearly as we are evaluating the screening for a large number of frequencies these calculations will be heavier than the case above where the calculation of the screening was done for only two frequencies (zero and plasmon-pole).
As before, collect the valence and conduction bands as a function of the number of frequencies in a file called ''gw_ff.dat'' and plot the behaviour of the conduction and valence bands and the gap.

$ gnuplot
gnuplot> p "gw_ff.dat" u 1:2 w lp t "Valence"
gnuplot> p "gw_ff.dat" u 1:3 w lp t "Conduction"
gnuplot> p "gw_ff.dat" u 1:($3-$2) w lp t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Real axis integration, convergences with respect the used number of frequencies">
File:ff_v.png|Valence band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_c.png|Conduction band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_g.png|Gap energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
</gallery>

We can see that:

* Oscillations are still present, indicating that even more frequencies have to be considered. In general, a real-axis calculation is very demanding.
* The final result of the gap obtained up to now does not differ too much from the one obtained at the plasmon-pole level (~50 meV)

As the real-axis calculation is computational demanding, in the last years it has been developed in Yambo an alternative method called Multipole approximation (MPA). The MPA can be seen as an extension of the PPA, where you take a finite number of poles instead of just one. It has been seen that even a small number of poles grants a MPA on par with a full real axis calculation. For your reference, at [[Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)|https://www.yambo-code.eu/wiki/index.php/Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)]] there is a tutorial on the MPA method. For now, however, let us pass over to the next section.

===Secant Solver===

The real axis integration permits also to go beyond the linear expansion to solve the quasi particle equation.
The QP equation is a non-linear equation whose solution must be found using a suitable numerical algorithm.
[[File:Eqp_sec.png|none|x22px|caption]]
The mostly used, based on the linearization of the self-energy operator is the Newton method that is the one we have used up to now.
Yambo can also perform a search of the QP energies using a non-linear iterative method based on the [https://en.wikipedia.org/wiki/Secant_method Secant iterative Method].

In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite difference approximation of Newton's method.
The equation that defines the secant method is:

[[File:secant_eq.png|none|x35px|caption]]

The first two iterations of the secant method are shown in the following picture. The red curve shows the function f and the blue lines are the secants.

[[File:Secant_method.png|center|250px|caption]]

To see if there is any non-linear effect in the solution of the Dyson equation we compare the result of the calculation using the Newton solver as done before with the present case.
In order to use the secant method you need to edit one of the the previous ''gw_ffN.in'' files e.g. ''gw_ff100.in'' and substitute:

DysSolver= "g"
to
DysSolver= "s"

Repeat the calculations in the same way as before:

$ yambo -F gw_ff100.in -J ff100

Note than now the screening will ''not'' be calculated again as it has been stored in the ''ffN'' directories as can be seen in the report file:

[05] Dynamical Dielectric Matrix
================================
[RD./ff10//ndb.em1d]----
...
[RD./ff10//ndb.em1d_fragment_1]--------------
...

Comparing the output files, e.g. for the case with 100 freqs:

''o-ff100.qp'' '''Newton Solver:'''
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08708 2.91254
7 9 3.877976 1.421968 -3.417357

''o-ff100.qp_01'' '''Secant Solver:'''
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08715 2.93518
7 9 3.877976 1.401408 -3.731649

From the comparison, we see that the effect is of the order of 20 meV, of the same order of magnitude of the accuracy of the GW calculations.

==Step 3: Interpolating Band Structures==
Up to now we have checked convergence for the gap. Now we want to calculate the quasiparticle corrections across the Brillouin zone in order to visualize the entire band structure along a path connecting high symmetry points.

To do that we start by calculating the QP correction in the plasmon-pole approximation for all the k points of our sampling and for a number of bands around the gap. You can use a previous input file or generate a new one: <code> yambo -F gw_ppa_all_Bz.in -x -p p -g n </code> and set the parameters found in the previous tests:

EXXRLvcs= 40 Ry
% BndsRnXp
1 | 30 | # [Xp] Polarization function bands
%
NGsBlkXp= 3000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 40 | # [GW] G[W] bands range
%

and we calculate it for all the available kpoints by setting:
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 14| 6|11|
%

Note that as we have already calculated the screening with these parameters you can save time and use that database either by running in the previous directory by using <code> -J 30b_3Ry </code> or if you prefer to put the new databases in the new all_Bz directory you can create a new directory and copy there the screening databases:

$ mkdir all_Bz
$ cp ./30b_3Ry/ndb.pp* ./all_Bz/

and launch the calculation:

$ yambo -F gw_ppa_all_Bz.in -J all_Bz

Now we can inspect the output and see that it contains the correction for all the k points for the bands we asked:

# K-point Band Eo E-Eo Sc|Eo
#
1 6 -1.299712 -0.219100 3.788044
1 7 -1.296430 -0.241496 3.788092
1 8 -1.296420 -0.243115 3.785947
1 9 4.832399 0.952386 -3.679259
1 10 10.76416 2.09915 -4.38743
1 11 11.36167 2.48053 -3.91021
....
By plotting some of the 'o-all_Bz.qp" columns it is possible to discuss some physical properties of the hBN QPs. Using columns 3 and (3+4), ie plotting the GW energies with respect to the LDA energies we can deduce the band gap renormalization and the stretching of the conduction/valence bands:

$ gnuplot
gnuplot> p "o-all_Bz.qp" u 3:($3+$4) w p t "Eqp vs Elda"

[[File:EqpvE0.png|center|350px|caption]]

Essentially we can see that the effect of the GW self-energy is the opening of the gap and a linear stretching of the conduction/valence bands that can be estimated by performing a linear fit of the positive and negative energies (the zero is set at top of the valence band).

In order to calculate the band structure, however, we need to interpolate the values we have calculated above on a given path. In Yambo the interpolation is done by the executable <code>ypp</code> (Yambo Post Processing).
By typing:
$ ypp -h
you will recognize that in order to interpolate the bands we need to build a ypp input file using
$ ypp -s b

we can generate the input for the band interpolation:

$ypp -s b -F ypp_bands.in

and edit the ''ypp_bands.in'' file:

electrons # [R] Electrons (and holes)
bnds # [R] Bands
INTERP_mode= "BOLTZ" # Interpolation mode (NN=nearest point, BOLTZ=boltztrap aproach)
OutputAlat= 4.716000 # [a.u.] Lattice constant used for "alat" ouput format
cooIn= "rlu" # Points coordinates (in) cc/rlu/iku/alat
cooOut= "rlu"
% BANDS_bands
1 | 100 | # Number of bands
%
INTERP_Shell_Fac= 20.00000 # The bigger it is a higher number of shells is used
CIRCUIT_E_DB_path= "none" # SAVE obtained from the QE `bands` run (alternative to %BANDS_kpts)
BANDS_path= "" # BANDS path points labels (G,M,K,L...)
BANDS_steps= 10
#BANDS_built_in # Print the bands of the generating points of the circuit using the nearest internal point
%BANDS_kpts
%

We modify the following lines:
BANDS_steps=30
% BANDS_bands
6 | 11 | # Number of bands
%
%BANDS_kpts # K points of the bands circuit
0.33300 |-.66667 |0.00000 |
0.00000 |0.00000 |0.00000 |
0.50000 |-.50000 |0.00000 |
0.33300 |-.66667 |0.00000 |
0.33300 |-.66667 |0.50000 |
0.00000 |0.00000 |0.50000 |
0.50000 |-.50000 |0.50000 |
%

which means we assign 30 points in each segment, we ask to interpolate 3 occupied and 3 empty bands and we assign the following path passing from the high symmetry points: K Γ M K H A L.
Launching:
$ ypp -F ypp_bands.in
will produce the output file ''o.bands_interpolated'' containing:

#
# |k| b6 b7 b8 b9 b10 b11 kx ky kz
#
#
0.00000 -7.22092 -0.13402 -0.13395 4.67691 4.67694 10.08905 0.33300 -0.66667 0.00000
0.03725 -7.18857 -0.17190 -0.12684 4.66126 4.71050 10.12529 0.32190 -0.64445 0.00000
...

and we can plot the bands using gnuplot:
$ gnuplot
gnuplot> p "o.bands_interpolated" u 0:2 w l, "o.bands_interpolated" u 0:3 w l, ...

[[File:bands_lda.png|center|350px|caption]]

and you can recognize the index of the high symmetry point by inspecting the last three columns.
Note that up to now we have interpolated the LDA band structure. In order to plot the GW band structure, we need to tell <code>ypp</code> in the input file where the ''ndb.QP'' database is found. This is achieved by adding in the ''ypp_bands.in'' file the line:

GfnQPdb= "E < ./all_Bz/ndb.QP"

and relaunch

$ ypp -F ypp_bands.in

Now the file ''o.bands_interpolated_01'' contains the GW interpolated band structure. We can plot the LDA and GW band structure together by using the gnuplot script [[gnuplot_scripts|bands.gnu]].

<gallery mode=nolines widths=500px heights=500px perrow=2 caption="Band strcuture of bulk hBN">
File:hBN_bands.png| LDA and GW bands structure
File:hBN_bands_lit.png| LDA and GW bands structure from Ref. <ref name="Arnaud" />
</gallery>

*As expected the effect of the GW correction is to open the gap.
*Comparing the obtained band structure with the one found in the literature by Arnaud and coworkers <ref name="Arnaud"> B. Arnaud, S. Lebegue,P. Rabiller, and M. Alouani Phys, Rev. Lett. 96, 026402 (2006)</ref> we found a very nice qualitative agreement.
*Quantitatively we found a smaller gap: about 5.2 eV (indirect gap), 5.7 eV (direct gap) while in Ref.<ref name="Arnaud" /> is found 5.95 eV for the indirect gap and a minimum direct band gap of 6.47 eV. Other values are also reported in the literature depending on the used pseudopotentials, starting functional and type of self-consistency (see below).
*The present tutorial has been done with a small k point grid which is an important parameter to be checked, so convergence with respect the k point sampling has to be validated.

==Step 4: Summary of the convergence parameters==
We have calculated the band structure of hBN starting from a DFT calculation, here we summarize the main variable we have checked to achieve convergence:

* <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> # [XX] Exchange RL components
Number of G-vectors in the exchange. This number should be checked carefully. Generally a large number is needed as the QP energies show a slow convergence. The calcualtion of the exchange part is rather fast.

*<code>[[Variables#BndsRnXp|BndsRnXp]]</code> #[Xp] Polarization function bands
Number of bands in the independent response function form which the dielectric matrix is calculated. Also this paramater has to be checked carefully,together with NGsBlkXp as the two variables are interconnected

*<code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> # [Xp] Response block size
Number of G-vectors block in the dielectric constant. Also this paramater has to be checked carefully, to be checked together with BndsRnXp. A large number of bands and block can make the calculation very demanding.

*<code>[[Variables#LongDrXp|LongDrXp]] </code> # [Xp] [cc] Electric Field
Direction of the electric field for the calculation of the q=0 component of the dielectric constant e(q,w). In a bulk can be set to (1,1,1), attention must be paid for non 3D systems.

*<code>[[Variables#PPAPntXp|PPAPntXp]] </code> # [Xp] Plasmon pole imaginary energy: this is the second frequency used to fit the Godby-Needs plasmon-pole model (PPM). If results depend consistently by changing this frequency, the PPM is not adequate for your calculation and it is need to gp beyond that, e.g. Real-axis.

*<code>[[Variables#GbndRnge|GbndRnge]] </code> # [GW] G[W] bands range
Number of bands used to expand the Green's function. This number is usually larger than the number of bands used to calculated the dielectricconstant. Single quasiparticle energies converge slowly with respect GbndRnge, energy difference behave better. You can use terminator technique to mitigate the slow dependence.

*<code>[[Variables#GDamping|GDamping]] </code> # [GW] G[W] damping
Small damping in the Green's function definition, the delta
parameter. The final result shouuld not depend on that, usually set at 0.1 eV

*<code>[[Variables#dScStep|dScStep]] </code> # [GW]
Energy step to evaluate Z factors

*<code>[[Variables#DysSolver|DysSolver]] </code> # [GW] Dyson Equation solver ("n","s","g")
Parameters related to the solution of the Dyson equation, "n" Newton linearization, 's' non linear secant method

*<code>[[Variables#GTermKind|GTermKind]] </code> [GW] GW terminator
Terminator for the self-energy<ref name="BG" /> . We have seen how this spped up the convergence with respect empty bands.

*<code>[[Variables#%QPkrange |QPkrange ]] </code> # [GW] QP generalized Kpoint/Band indices
K-points and band range where you want to calculate the GW correction. The syntax is
first kpoint | last kpoint | first band | last band

==References==

Next steps: RPA calculations (standalone)

2025-05-18T13:42:23Z

Giacomo.sesti: /* Non-local commutator */

==Optical absorption in hBN: independent particle approximation ==

[[File:HBN-bulk-3x3-annotated.png|x200px|Atomic structure of bulk hBN]]

=== Background ===
The dielectric function in the long-wavelength limit, at the independent particle level (RPA without local fields), is essentially given by the following:

<math>\epsilon_{\alpha, \alpha}(\omega)=1+\frac{16 \pi}{\Omega} \sum_{c, v} \sum_{\mathbf{k}} \frac{1}{E_{c \mathbf{k}}-E_{v \mathbf{k}}} \frac{\left|\left\langle v \mathbf{k}\left|\mathbf{p}_{\alpha}+\mathrm i\left[V^{\mathrm{NL}}, \mathbf{r}_{\alpha}\right]\right| c \mathbf{k}\right\rangle\right|^{2}}{\left(E_{c \mathbf{k}}-E_{v \mathbf{k}}\right)^{2}-(\omega+\mathrm i \gamma)^{2}}</math>

In practice, Yambo does not use this expression directly but solves the Dyson equation for the susceptibility <math>\chi</math>, which is described in the [[Local fields]] module.

=== Choosing input parameters ===
Enter the folder for bulk hBN that contains the ''SAVE'' directory, run the initialization and generate the input file.
You can type <code>yambo -h</code> and see the available options for different run-levels. For an RPA optical spectrum calculation the correct option is <code>yambo -optics c</code> (or <code>yambo -o c</code>). Let's add some command line options:

$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo ''(initialization)''
$ yambo -F yambo.in_IP -o c
This corresponds to optical properties in G-space at the independent particle level: in the input file this is indicated by (<code>Chimod= "IP"</code>).

===Optics runlevel===
For optical properties we are interested just in the long-wavelength limit <math>\mathbf q = 0</math>. This always corresponds to the ''first'' <math>\mathbf q</math>-point in the set of possible <math>\mathbf q =\mathbf k - \mathbf k'</math>-points.
Change the following variables in the input file to:
% [[Variables#QpntsRX|QpntsRXd]]
1 | '''1''' | # [Xd] Transferred momenta
%
[[Variables#ETStpsX|ETStpsXd]]= '''1001''' # [Xd] Total Energy steps
in order to select just the first <math>\mathbf q</math>. The last variable ensures we generate a smooth spectrum.
Save the input file and launch the code, keeping the command line options as before (i.e., just remove the lower case options):
$ yambo -F yambo.in_IP -J Full
...
<---> [05] Optics
<---> [LA] SERIAL linear algebra
<---> [DIP] Checking dipoles header
<---> [X-CG] R(p) Tot o/o(of R): 5499 52992 100
<02s> Xo@q[1] |########################################| [100%] --(E) --(X)
<02s> [06] Timing Overview
<02s> [07] Memory Overview
<02s> [08] Game Over & Game summary

$ ls
Full SAVE yambo.in_IP r_setup
o-Full.eel_q1_ip o-Full.eps_q1_ip r-Full_optics_chi
Let's take a moment to understand what Yambo has done inside the Optics runlevel:
# Compute the <math>[\mathbf r, V^\mathrm{NL}]</math> term
# Read the wavefunctions from disc [WF]
# Compute the ''dipoles'', i.e. matrix elements of <math>\mathbf p</math>
# Write the dipoles to disk as ''Full/ndb.dip*'' databases. This you can see in the report file:
$ grep -A20 "WR" r-Full_optics_*
[WR./Full//ndb.dipoles]---------------------------------------------------------
Brillouin Zone Q/K grids (IBZ/BZ) : 14 72 14 72
RL vectors : 1491 [WF]
Fragmentation : yes
Electronic Temperature : 0.000000 [K]
Bosonic Temperature : 0.000000 [K]
DIP band range : 1 100
DIP band range limits : 8 9
DIP e/h energy range : -1.000000 -1.000000 [eV]
RL vectors in the sum : 1491
[r,Vnl] included : yes
...

<ol start="5">
<li>Finally, Yambo computes the non-interacting susceptibility <math>\chi</math> for this <math>\mathbf q</math>, and writes the dielectric function inside the ''o-Full.eps_q1_ip'' file for plotting</li>
</ol>

===Energy cut off===

Before plotting the output, let's change a few more variables. The previous calculation used ''all'' the G-vectors in expanding the wavefunctions, up to 1491 (~1016 components). This corresponds roughly to the cut off energy of 40Ry we used in the DFT calculation. Generally, however, we can use a smaller value. We use the verbosity to switch on this variable, and a new ''-J'' flag to avoid reading the previous database:
$ yambo -F yambo.in_IP '''-V RL''' -o c
Change the '''value''' of <code>[[Variables#FFTGvecs|FFTGvecs]]</code> and also its '''unit''' from <code>RL</code> (number of G-vectors) to <code>Ry</code> (energy in Rydberg):
[[Variables#FFTGvecs|FFTGvecs]]= '''6''' '''Ry''' # [FFT] Plane-waves
Save the input file and launch the code again:
$ yambo -F yambo.in_IP '''-J 6Ry '''
and then plot the ''o-Full.eps_q1_ip'' and ''o-6Ry.eps_q1_ip'' files:
$ gnuplot
gnuplot> plot "o-Full.eps_q1_ip" w l,"o-6Ry.eps_q1_ip" w p

[[File:CH-hBN-6Ry.png|none|500px|Yambo tutorial image]]

There is very little difference between the two spectra. This highlights an important point in calculating excited state properties: generally, fewer G-vectors are needed than what are needed in DFT calculations. Regarding the spectrum itself, the first peak occurs at about 4.4eV. This is consistent with the minimum direct gap reported by Yambo: 4.28eV. The comparison with experiment (not shown) is very poor however.

If you made some mistake and cannot reproduce this figure, you should check the value of <code>[[Variables#FFTGvecs|FFTGvecs]]</code> in the input file, delete the ''6Ry'' folder, and try again - taking care to plot the right file! (e.g. ''o-6Ry.eps_q1_ip_01''. The "_01" suffix means that while writing the output Yambo found another existing file with the name "o-6Ry.eps_q1_ip").

===q-direction===
Now let's select a different component of the dielectric tensor:
$ yambo -F yambo.in_IP -V RL -o c
...
% [[Variables#LongDrXd|LongDrXd]]
'''0.000000''' | 0.000000 | '''1.000000''' | # [Xd] [cc] Electric Field
%
...
$ yambo -F yambo.in_IP -J 6Ry
This time yambo reads from the ''6Ry'' folder, so it does not need to compute the dipole matrix elements again, and the calculation is fast. Plotting gives:
$ gnuplot
gnuplot> plot "o-6Ry.eps_q1_ip" t "q || x-axis" w l,"o-6Ry.eps_q1_ip_01" t "q || c-axis" w l

[[File:CH-hBN-ac.png|none|500px|Yambo tutorial image]]
The absorption is suppressed in the stacking direction. As the interplanar spacing is increased, we would eventually arrive at the absorption of the BN sheet (see [[Local fields]] tutorial).

===Non-local commutator===
Last, we show the effect of switching off the non-local commutator term (the term with <math> V^\mathrm{NL} </math> in the equation at the start of this tutorial) due to the pseudopotential. As there is no option to do this inside yambo, you need to hide the database file. Change back to the ''q || (1 0 0)'' direction, and launch yambo with a different <code>-J</code> option:
$ mv SAVE/ns.kb_pp_pwscf SAVE/ns.kb_pp_pwscf_'''OFF'''
$ yambo -F yambo.in_IP -J '''6Ry_NoVnl''' -o c

Change

%LonDrXd

back to

'''1.000000''' | 0.000000 | '''0.000000''' |

and run

$ yambo -F yambo.in_IP -J 6Ry_NoVnl

Note the warning in the output:
<---> [WARNING] [r,Vnl^pseudo] not included in position and velocity dipoles
which also appears in the report file <code>r-6Ry_NoVnl_optics_dipoles_chi</code> as <code>[r,Vnl] included :no</code>. The difference is tiny:
[[File:CH-hBN-Vnl.png|none|500px|Yambo tutorial image]]

However, when your system is larger, with more projectors in the pseudopotential or more k-points (see the BSE tutorial), the inclusion of <math>V^\mathrm{NL}</math> can make a huge difference in the computational load, so it is always worth checking to see if the terms are important in your system.

==Optical absorption in 2D BN: local field effects ==

[[File:HBN2.png|x200px|Atomic structure of 2D hBN]]

=== Background ===
[[File:Yambo-Cheatsheet-5.0_P7.png|thumb|Cheatsheet on LFE|150px]]
The macroscopic dielectric function is obtained by including the so-called local field effects (LFE) in the calculation of the response function. Within the time-dependent DFT formalism this is achieved by solving the Dyson equation for the susceptibility <math>\chi</math>. In reciprocal space this is given by:

<math>\chi_{\mathbf{G}, \mathbf{G}^{\prime}}(\mathbf{q}, \omega) = \chi_{\mathbf{G}, \mathbf{G}^{\prime}}^{0}(\mathbf{q}, \omega)+\sum_{\mathbf{G}_{1}, \mathbf{G}_{2}} \chi_{\mathbf{G}, \mathbf{G}_{1}}^{0}(\mathbf{q}, \omega)\left[v_{\mathbf{G}_{1}}(\mathbf{q}) \delta_{\mathbf{G}_{1}, \mathbf{G}_{2}}+f_{\mathbf{G}_{1}, \mathbf{G}_{2}}^{x c}\right] \chi_{\mathbf{G}_{2}, \mathbf{G}^{\prime}}(\mathbf{q}, \omega)</math>

The microscopic dielectric function is related to <math>\chi</math> by:

<math>\epsilon_{\mathbf{G}, \mathbf{G}^{\prime}}^{-1}(\mathbf{q}, \omega)=\delta_{\mathbf{G}, \mathbf{G}^{\prime}}+v_{\mathbf{G}}(\mathbf{q}) \chi_{\mathbf{G}, \mathbf{G}^{\prime}}(\mathbf{q}, \omega)</math>

and the macroscopic dielectric function is obtained by taking the (0,0) component of the inverse microscopic one:

<math>\epsilon_{M}(\omega)=\lim _{\mathrm{q} \rightarrow 0} \frac{1}{\epsilon_{\mathrm{G}=0, \mathrm{G}^{\prime}=0}^{-1}(\mathbf{q}, \omega)}</math>

Experimental observables like the optical absorption and the electron energy loss can be obtained from the macroscopic dielectric function:

<math>\operatorname{Abs}(\omega)=\operatorname{Im} \epsilon_{M}(\omega) \quad \operatorname{EELS}(\omega)=-\operatorname{Im} \frac{1}{\epsilon_{M}(\omega)}</math>

In the following we will neglect the <math>f^{xc}</math> term: we perform the calculation at the RPA level and consider just the Hartree term (from <math>v_G</math>) in the kernel. If we also neglect the Hartree term, we arrive back at the independent particle approximation, since there is no kernel and <math>\chi = \chi_0</math>.

=== Choosing input parameters ===
Enter the folder for 2D hBN that contains the SAVE directory, and generate the input file. To include local the local fields variables in the input file the correct option is <code>yambo -o c -k hartree</code> (once again you can check it with <code>yambo -h</code>). Let's start by running the calculation for light polarization ''q'' in the plane of the BN sheet:
$ cd YAMBO_TUTORIALS/hBN-2D/YAMBO
$ yambo ''(Initialization)''
$ yambo -F yambo.in_RPA -V RL -o c -k hartree
We thus use a new input file ''yambo.in_RPA'', switch on the <code>FFTGvecs</code> variable, and label all outputs/databases with a ''q100'' tag. Make sure to set/modify all of the following variables to:
[[Variables#FFTGvecs|FFTGvecs]]= '''6 Ry''' # [FFT] Plane-waves
[[Variables#Chimod|Chimod]]= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
[[Variables#NGsBlkXd|NGsBlkXd]]= ''' 3 Ry''' # [Xd] Response block size
% [[Variables#QpntsRXd|QpntsRXd]]
1 | '''1''' | # [Xd] Transferred momenta
%
% [[Variables#EnRngeXd|EnRngeXd]]
0.00000 | '''20.00000''' | eV # [Xd] Energy range
%
% [[Variables#DmRngeXd|DmRngeXd]]
'''0.200000''' | '''0.200000''' | eV # [Xd] Damping range
%
[[Variables#ETStpsXd|ETStpsXd]]= 2001 # [Xd] Total Energy steps
% [[Variables#LongDrXd|LongDrXd]]
1.000000 | 0.000000 | 0.000000 | # [Xd] [cc] Electric Field
%
In this input file:
* We evaluate the <math>\mathbf q \rightarrow 0</math> response function choosing the direction for the limit parallel in the plane of the hBN sheet;
* We set a wider energy range than before, and a larger broadening;
* We select the Hartree kernel, and expanded G-vectors in the screening up to 3 Ry (about 85 G-vectors);

===LFEs in periodic direction===
Now let's run the code with this new input file (CECAM in serial: about 2mins; parallel 4 tasks: 50s)
$ yambo -F yambo.in_RPA -J q100
and let's compare the absorption with and without the local fields included. By inspecting the ''o-q100.eps_q1_inv_rpa_dyson'' file we find that this information is given in the 2<math>^\mathrm{nd}</math> and 4<math>^\mathrm{th}</math> columns, respectively:
$ head -n 40 o-q100.eps_q1_inv_rpa_dyson
# Absorption @ Q(1) [q->0 direction] : 1.0000000 0.0000000 0.0000000
# E/ev[1] EPS-Im[2] EPS-Re[3] EPSo-Im[4] EPSo-Re[5]
Plot the result:
$ gnuplot
$ gnuplot> plot "o-q100.eps_q1_inv_rpa_dyson" u 1:2 w l t "RPA-LFA","o-q100.eps_q1_inv_rpa_dyson" u 1:4 w l t "noLFE", "o-q100.eel_q1_inv_rpa_dyson" u 1:4 w l ls 7 dt 2 t "EELS"
[[File:CH-LFE4.png|none|600px|Yambo tutorial image]]
There is little influence of local fields in this case. This is generally the case for semiconductors or materials with a smoothly varying electronic density. We have also shown the EELS spectrum (''o-q100.eel_q1_inv_rpa_dyson'') for comparison.

===LFEs in non-periodic direction===
Now let's switch to ''q'' perpendicular to the BN plane:
$ yambo -F yambo.in_RPA -V RL -o c -k hartree ''and set''
...
% [[Variables#LongDrXd|LongDrXd]]
0.000000 | 0.000000 | '''1.000000''' | # [Xd] [cc] Electric Field
%

You can try out the default parallel usage now, or run again in serial, i.e.
$ yambo -F yambo.in_RPA -J '''q001''' ''(serial)''
$ mpirun -np 4 yambo -F yambo.in_RPA -J '''q001''' & ''(parallel, MPI only, 4 tasks)''
As noted previously, the ''log'' files in parallel appear in the LOG folder, you can follow the execution with <code>tail -F LOG/l-q001_optics_chi_CPU_1</code> .

Plotting the output file:
$ gnuplot
gnuplot> plot "o-q001.eps_q1_inv_rpa_dyson" u 1:2 w l,"o-q001.eps_q1_inv_rpa_dyson" u 1:4 w l
[[File:CH-LFE6.png|none|600px|Yambo tutorial image]]
In this case, the absorption is strongly blueshifted with respect to the in-plane absorption. Furthermore, the influence of local fields is striking, and quenches the spectrum strongly. This is the well known depolarization effect. Local field effects are much stronger in the perpendicular direction because the charge inhomogeneity is dramatic. Many G-vectors are needed to account for the sharp change in the potential across the BN-vacuum interface.

===Absorption versus EELS===
In order to understand this further, we plot the electron energy loss spectrum for this component and compare with the absorption:
$ gnuplot
$ gnuplot > plot "o-q001.eps_q1_inv_rpa_dyson" w l,"o-q001.eel_q1_inv_rpa_dyson" w l
[[File:CH-LFE5.png|none|600px|Yambo tutorial image]]
The conclusion is that the absorption and EELS coincide for isolated systems.
To understand why this is, you need to consider the role of the ''macroscopic'' screening in the response function and the long-range part of the Coulomb potential.
See e.g.<ref>TDDFT from molecules to solids: The role of long‐range interactions, F. Sottile et al, International journal of quantum chemistry 102 (5), 684-701 (2005)</ref>

==Links==
* Back to [[ICTP 2022#Tutorials]]
* Back to [[CECAM VIRTUAL 2021#Tutorials]]

GW on h-BN (standalone)

2025-05-18T13:36:15Z

Giacomo.sesti: /* Understanding the output */

This is a modified version of the full tutorial on GW computations present on the Yambo wiki. Later, you can have a look at the extended version at [[How to obtain the quasi-particle band structure of a bulk material: h-BN]]

In this tutorial you will learn how to:
* Calculate quasi-particle corrections in the Hartree-Fock approximation
* Calculate quasi-particle corrections in the GW approximation
* Choose the input parameters for a meaningful converged calculation
* Plot a band structure including quasi-particle corrections
We will use bulk hBN as an example system. Before starting, you need to obtain the appropriate tarball. See instructions on the [[Tutorials|main tutorials page]]. 
We strongly recommend that you first complete the [[First steps: a walk through from DFT to optical properties]] tutorial.



The aim of the present tutorial is to obtain quasiparticle correction to energy levels using many-body perturbation theory (MBPT). 
The general non-linear quasiparticle equation reads:
[[File:Eqp_1.png|none|x26px|caption]]
As a first step we want to evaluate the self energy Σ entering in the quasiparticle equation. In the GW approach the self-energy can be separated into two components: a static term called the exchange self-energy (Σx), and a dynamical term (energy dependent) called the correlation self-energy (Σc):
[[File:Sigma.png|none|x25px|caption]]
We will treat these two terms separately and demonstrate how to set the most important variables for calculating each term.
In practice we will compute the quasi-particle corrections to the one particle Kohn-Sham eigenvalues obtained through a DFT calculation.

The steps are the following:

==Step 1: The Exchange Self Energy or HF quasi-particle correction==

We start by evaluating the exchange Self-Energy and the corresponding Quasiparticle energies (Hartree-Fock energies).
Follow the module on '''[[Hartree Fock]]''' and then return to this tutorial ''How to obtain the quasiparticle band structure of a bulk material: h-BN''.

==Step 2: The Correlation Self-Energy and Quasiparticle Energies==
Once we have calculated the exchange part, we next turn our attention to the more demanding dynamical part. The correlation part of the self-energy in a plane wave representation reads:
[[File:Sigma_c.png|none|x50px|caption]]
In the expression for the correlation self energy, we have (1) a summation over bands, (2) an integral over the Brillouin Zone, and (3) a sum over the G vectors. In contrast with the case of Σx, the summation over bands extends over ''all'' bands (including the unoccupied ones), and so convergence tests are needed. Another important difference is that the Coulomb interaction is now screened so a fundamental ingredient is the evaluation of the dynamical dielectric matrix. The expression for the dielectric matrix, calculated at the RPA level and including local field effects, has been already treated in the [[Local fields|Local fields]] tutorial.

In the following, we will see two ways to take into account the dynamical effects. First, we will see how to set the proper parameters to obtain a model dielectric function based on a widely used approximation, which models the energy dependence of each component of the dielectric matrix with a single pole function.
Secondly, we will see how to perform calculations by evaluating the dielectric matrix on a regular grid of frequencies.

Once the correlation part of the self-energy is calculated, we will check the convergence of the different parameters with respect to some final quantity, such as the gap.

After computing the frequency dependent self-energy, we will discover that in order to solve the quasiparticle equation we will need to know its value ''at the value of the quasiparticle itself''. In the following, unless explicitly stated, we will solve the non-linear quasi-particle equation at first order, by expanding the self-energy around the Kohn-Sham eigenvalue. In this way the quasiparticle equation reads:

[[File:Eqp_2.png|none|x26px|caption]]

where the normalization factor Z is defined as:

[[File:z_fac.png|none|x40px|caption]]

===The Plasmon Pole approximation===
As stated above, the basic idea of the plasmon-pole approximation is to approximate the frequency dependence of the dielectric matrix with a single pole function of the form:
[[File:ppa.png|none|x26px|caption]]
The two parameters RGG' and ΩGG' are obtained by a fit (for each component), after having calculated the RPA dielectric matrix at two given frequencies.
Yambo calculates the dielectric matrix in the static limit ( ω=0) and at a user defined frequency called the plasmon-pole frequency (ω=iωp).
Such an approximation has the big computational advantage of calculating the dielectric matrix for only two frequencies and leads to an analytical expression for the frequency integral of the correlation self-energy.
==== Input file generation ====
Let's start by building up the input file for a GW/PPA calculation, including the calculation of the exchange self-energy. From <code>yambo -H</code> you should understand that the correct option is <code>yambo -x -p p -g n</code>:

$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo -x -p p -g n -F gw_ppa.in

Let's modify the input file in the following way:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] GoWo Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
[[Variables#EXXRLvcs|EXXRLvcs]] = 40 Ry # [XX] Exchange RL components
[[Variables#VXCRLvcs|VXCRLvcs]] = 3187 RL # [XC] XCpotential RL components
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% BndsRnXp
1 | 10 | # [Xp] Polarization function bands
%
[[Variables#NGsBlkXp|NGsBlkXp]]= 1000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
[[Variables#PPAPntXp|PPAPntXp]] = 27.21138 eV # [Xp] PPA imaginary energy
%[[Variables#GbndRnge|GbndRnge]]
1 | 40 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
7| 7| 8| 9|
%

Brief explanation of some settings:
* Similar to the Hartree Fock study, we will concentrate on the convergence of the '''direct''' gap of the system. Hence we select the last occupied (8) and first unoccupied (9) bands for k-point number 7 in the <code>[[Variables#QPkrange|QPkrange]]</code> variable.
* We also keep <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> at its converged value of 40 Ry as obtained in the '''[[Hartree Fock]]''' tutorial.
* For the moment we keep fixed the plasmon pole energy <code>[[Variables#PPAPntXp|PPAPntXp]]</code> at its default value (=1 Hartree).
* We keep fixed the direction of the electric field for the evaluation of the dielectric matrix to a non-specific value: <code>[[Variables#LongDrXp|LongDrXp]]</code>=(1,1,1).
* Later we will study convergence with respect to <code>[[Variables#GbndRnge|GbndRnge]]</code>, <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>, and <code>[[Variables#BndsRnXp|BndsRnXp]]</code>; for now just set them to the values indicated.

==== Understanding the output ====
Let's look at the typical Yambo output. Run Yambo with an appropriate <code>-J</code> flag:

$ yambo -F gw_ppa.in -J 10b_1Ry

In the standard output you can recognise the different steps of the calculations: calculation of the screening matrix (evaluation of the non interacting and interacting response), calculation of the exchange self-energy, and finally the calculation of the correlation self-energy and quasiparticle energies. Moreover information on memory usage and execution time are reported:
...
<---> [05] Dynamic Dielectric Matrix (PPA)
...
<08s> Xo@q[3] |########################################| [100%] 03s(E) 03s(X)
<08s> X@q[3] |########################################| [100%] --(E) --(X)
...
<43s> [06] Bare local and non-local Exchange-Correlation
<43s> EXS |########################################| [100%] --(E) --(X)
...
<43s> [07] Dyson equation: Newton solver
<43s> [07.01] G0W0 (W PPA)
...
<45s> G0W0 (W PPA) |########################################| [100%] --(E) --(X)
...
<45s> [07.02] QP properties and I/O
<45s> [08] Game Over & Game summary

Let's have a look at the report and output. The output file ''o-10b_1Ry.qp'' contains (for each band and k-point that we indicated in the input file) the values of the bare KS eigenvalue, its GW correction and the correlation part of the self energy:
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.411876 -0.567723 2.322443
7 9 3.877976 2.413773 -2.232241

In the header you can see the details of the calculations, for instance it reports that <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=1 Ry corresponds to 5 Gvectors:

# X G`s [used]: 5

Other information can be found in the report file ''r-10b_1Ry_em1d_ppa_HF_and_locXC_gw0'', such as the renormalization factor defined above, the value of the ''exchange'' self-energy (non-local XC) and of the DFT exchange-correlation potential (local XC):

[07.02] QP properties and I/O
=============================
Legend (energies in eV):
- B : Band - Eo : bare energy
- E : QP energy - Z : Renormalization factor
- So : Sc(Eo) - S : Sc(E)
- dSp: Sc derivative precision
- lXC: Starting Local XC (DFT)
-nlXC: Starting non-Local XC (HF)
QP [eV] @ K [7] (iku): 0.000000 -0.500000 0.000000
B=8 Eo= -0.41 E= -0.98 E-Eo= -0.56 Re(Z)=0.81 Im(Z)=-.2368E-2 nlXC=-19.13 lXC=-16.11 So= 2.322
B=9 Eo= 3.88 E= 6.29 E-Eo= 2.41 Re(Z)=0.83 Im(Z)=-.2016E-2 nlXC=-5.536 lXC=-10.67 So=-2.232

Extended information can be also found in the output activating in the input the <code>[[Variables#ExtendOut|ExtendOut]]</code> flag. This is an optional flag that can be activated by adding the <code>-V qp</code> verbosity option when building the input file. The plasmon pole screening, exchange self-energy and the quasiparticle energies are also saved in databases so that they can be reused in further runs:

$ ls ./10b_1Ry
ndb.pp ndb.pp_fragment_1 ... ndb.HF_and_locXC ndb.QP

===Convergence tests for a quasi particle calculation===

Now we can check the convergence of the different variables entering in the expression of the correlation part of the self energy. 
First, we focus on the parameter governing the ''screening matrix'' you have already seen in the [[RPA/IP]] section. In contrast to the calculation of the [[RPA/IP]] dielectric function, where you considered either the optical limit or a finite q response (EELS), here the dielectric matrix will be calculated for ''all'' q-points determined by the choice of k-point sampling.

The parameters that need to be converged can be understood by looking at expression of the dielectric matrix:
[[File:Yambo-CH5.png|none|x30px|Yambo tutorial image]]
where ''χGG''' is given by
[[File:Dyson_rpa.png|none|x40px|Yambo tutorial image]]
and χ0GG' is given by
[[File:ChiO.png|none|x45px|Yambo tutorial image]]
* <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> : The dimension of the microscopic inverse matrix, related to [[Local fields]]
* <code>[[Variables#BndsRnXp|BndsRnXp]]</code> : The sum on bands (c,v) in the independent particle χ0GG'

====Converging Screening Parameters====
Here we will check the convergence of the gap starting from the variables controlling the screening reported above: the bands employed to build the RPA response function <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and the number of blocks (G,G') of the dielectric matrix ε-1G,G' <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
In the next section we will study convergence with respect to the sum over states summation (sum over ''m'' in the Σc expression); here let's fix <code>[[Variables#GbndRnge|GbndRnge]]</code> to a reasonable value (40 Ry).

Let's build a series of input files differing by the values of bands and block sizes in χ0GG' considering <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in the range 10-50 (upper limit) and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> in the range 1000 to 5000 mRy. To do this by hand, file by file, open the ''gw_ppa.in'' file in an editor and change to:
[[Variables#NGsBlkXp|NGsBlkXp]] = '''2000 mRy'''
while leaving the rest untouched. Repeat for 3000 mRy, 4000 mRy etc. Next, for each <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> change to:
% [[Variables#BndsRnXp|BndsRnXp]]
1 | 20 | # [Xp] Polarization function bands
%
and repeat for 30, 40 and so on. Give a '''different name''' to each file: ''gw_ppa_Xb_YRy.in'' with X=10,20,30,40 and Y=1000,2000,3000,4000,5000 mRy.

This is obviously quite tedious. However, you can automate both the input construction and code execution using bash or python scripts (indeed later you will learn how to use the the yambo-python [http://www.yambo-code.org/wiki/index.php?title=GW_tutorial._Convergence_and_approximations_(BN)]tool for this task). For now, you can use the simple [[bash_scripts|generate_inputs_1.sh]] bash script to generate the required input files (after copying the script you need to type <code>$ chmod +x name_of_the_script.sh </code>).

Finally launch the calculations:

$ yambo -F gw_ppa_10b_1Ry.in -J 10b_1Ry
$ yambo -F gw_ppa_10b_2Ry.in -J 10b_2Ry
...
$ yambo -F gw_ppa_40b_5Ry.in -J 40b_5Ry

Once the jobs are terminated we can collect the quasiparticle energies for fixed <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in different files named e.g. ''10b.dat, 20b.dat'' etc. for plotting, by putting in separate columns: the energy cutoff; the size of the G blocks; the quasiparticle energy of the valence band; and that of the conduction band.
To do this e.g. for the ''10b.dat'' file you can type:
$ cat o-10b* | grep "7 * 8" | awk '{print $3+$4}'
to parse the valence band quasiparticle energies and
$ cat o-10b* | grep "7 * 9" | awk '{print $3+$4}'
for the conduction band; and put all the data in the ''10b.dat'' files. As there are many files to process you can use the [[bash_scripts|parse_qps.sh]] script to create the ''10b.dat'' file and edit the script changing the number of <code>[[Variables#BndsRnXp|BndsRnXp]]</code> for the other output files.

Once we have collected all the quasiparticle values we can plot the gap, or the valence and conduction band energies separately, as a function of the block size or energy cutoff:
$ gnuplot
gnuplot> p "10b.dat" u 1:3 w lp t " Valence BndsRnXp=10", "20b.dat" u 1:3 w lp t "Valence BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:4 w lp t " Conduction BndsRnXp=10", "20b.dat" u 1:4 w lp t "Conduction BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:($4-$3) w lp t " Gap BndsRnXp=10", "20b.dat" u 1:($4-$3) w lp t "gap BndsRnXp=20",..
or both using e.g. the [[gnuplot_scripts|ppa_gap.gnu]] gnuplot script:

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect screening parameters">
File:ppa2.png|Valence band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
File:ppa3.png|Conduction band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
</gallery>

Looking at the plot we can see that:
* The two parameters are not totally independent (see e.g. the valence quasiparticle convergence) and this is the reason why we converged them simultaneously
* The gap (energy difference) converge faster than single quasiparticle state
* The convergence criteria depends on the degree of accuracy we look for, but considering the approximations behind the calculations (plasmon-pole etc.), it is not always a good idea to enforce too strict a criteria.
* Even if not totally converged we can consider that the upper limit of <code>[[Variables#BndsRnXp|BndsRnXp]]</code>=30, and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=3Ry are reasonable parameters.

Finally notice that in the last versions of Yambo (from 4.5 etc.) there is a terminator for the dielectric constant that accelerates convergence on the number of bands,
you can active it with the <code>-V resp</code> verbosity and setting <code>XTermKind= "BG"</code>. You can compare the converge of the dielectric constant with and without terminator,
similar to what happens for the self-energy, see next section.

====Converging the sum over states in the correlation self-energy====
From now on we will keep fixed these parameters and will perform a convergence study on the sum over state summation in the correlation self-energy (Σc) <code>[[Variables#GbndRnge|GbndRnge]]</code>.

In order to use the screening previously calculated we can copy the plasmon pole parameters saved in the <code>30b_3Ry</code> directory in the <code>SAVE</code> directory. In this way the screening will be read by Yambo and not calculated again:

$ cp ./30b_3Ry/ndb.pp* ./SAVE/.

(Note: you may have to delete these files before running the BSE tutorials)

In order to use the databases we have to be sure to have the same plasmon-pole parameters in our input files.
Edit ''gw_ppa_30b_3Ry.in'' and modify <code>[[Variables#GbndRnge|GbndRnge]]</code> to order to have a number of bands in the range from 10 to 80 inside different files named ''gw_ppa_Gbnd10.in'', ''gw_ppa_Gbnd20.in'' etc. You can also run the the [[bash_scripts|generate_inputs_2.sh]] bash script to generate the required input files.

Next, launch yambo for each input:

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20
...

and as done before we can inspect the obtained quasiparticle energies:

$ grep "7 * 8" o-Gbnd* | awk '{print $4+$5}'
for the valence bands, and
$ grep "7 * 9" o-Gbnd* | awk '{print $4+$5}'
for the conduction band.

Collect the results in a text file ''Gbnd_conv.dat'' containing: the number of Gbnd, the valence energy, and the conduction energy.
Now, as done before we can plot the valence and conduction quasiparticle levels separately as well as the gap, as a function of the number of bands used in the summation:

$ gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp lt 7 t "Valence"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp lt 7 t "Conduction"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp lt 7 t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect sum over states in correlation self-energy">
File:Gbnd_val.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>.
File:Gbnd_cond.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
File:Gbnd_gap.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
</gallery>

Inspecting the plot we can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is rather slow and many bands are needed to get converged results.
* As observed above the gap (energy difference) converges faster than the single quasiparticle state energies.

====Accelerating the sum over states convergence in the correlation self-energy====
In general the convergence with respect the sum over states can be very cumbersome. Here we show how it can be mitigated by using
a technique developed by F. Bruneval and X. Gonze <ref name="BG"> F. Bruneval and X. Gonze, Physical Review B 78, 085125 (2008 )</ref>. The basic idea relies in replacing the eigenenergies of the states that are not treated explicitly by a common energy, and take into account all the states, which are not explicitly included in the calculation through the closure relation.
To apply this technique in Yambo we need to activate the optional terminator variable [[Variables#GTermKind|GTermKind]]. You can edit the input file by setting:

[[Variables#GTermKind|GTermKind]]= "BG"

for example in the file ''gw_ppa_Gbnd10.in''. Now, repeat this operation also in all the other ''gw_ppa_GbndX.in'' input files.
We can now perform once again the same calculations with the terminator activated

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10_term
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20_term
...

and collect the new results:

$ grep "7 * 8" *term.qp | awk '{print $4+$5}'
$ grep "7 * 9" *term.qp | awk '{print $4+$5}'

in a new file called ''Gbnd_conv_terminator.dat''. Now we can plot the same quantities as before by looking at the effect of having introduced the terminator.

gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp t "Valence", "Gbnd_conv_terminator.dat" u 1:2 w lp t "Valence with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp t "Conduction", "Gbnd_conv_terminator.dat" u 1:3 w lp t "Conduction with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp t "Gap", "Gbnd_conv_terminator.dat" u 1:($3-$2) w lp t "Gap with terminator"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Effect of the terminator in convergence of QP energies with respect sum over states">
File:val_t.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique <ref name="BG" />
File:cond_t.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
File:gap_t.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
</gallery>

We can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is accelerated in particular for the single quasiparticle states.
* From the plot above we can see that <code>[[Variables#GbndRange|GbndRange]]</code>=40 is sufficient to have a converged gap

===Beyond the plasmon pole approximation: a full frequency approach - real axis integration===
All the calculations performed up to now were based on the plasmon pole approximation (PPA). Now we remove this approximation by evaluating numerically the frequency integral in the expression of the correlation self-energy (Σc). To this aim we need to evaluate the screening for a number of frequencies in the interval depending on the electron-hole energy difference (energy difference between empty and occupied states) entering in the sum over states.
Let's build the input file for a full frequency calculation by simply typing:

$ yambo -F gw_ff.in -g n -p r

and we can set the variable studied up to now at their converged value:

%[[Variables#BndsRnXd|BndsRnXd]]
1 | 30 |
%
[[Variables#NGsBlkXd|NGsBlkXd]]=3000 mRy
%[[Variables#GbndRange|GbndRange]]
1 | 40 |
%
[[Variables#EXXRLvcs|EXXRLvcs]]=40000 mRy
% [[Variables#LongDrXd|LongDrXd]]
1.000000 | 1.000000 | 1.000000 | # [Xd] [cc] Electric Field
%
%QPkrange # # [GW] QP generalized Kpoint/Band indices
7|7|8|9|
%

and next vary in different files (name them ''gw_ff10.in'' etc.) the number of frequencies we evaluate for the screened coulomb potential. This is done by setting the values of [[Variables#ETStpsXd|ETStpsXd]]=10 , 50 , 100, 150, 200, 250.
You can also run the the [[bash_scripts|generate_inputs_3.sh]] bash script to generate the required input files.

Next launch yambo:
$ yambo -F gw_ff10.in -J ff10
$ yambo -F gw_ff50.in -J ff50
...

Clearly as we are evaluating the screening for a large number of frequencies these calculations will be heavier than the case above where the calculation of the screening was done for only two frequencies (zero and plasmon-pole).
As before, collect the valence and conduction bands as a function of the number of frequencies in a file called ''gw_ff.dat'' and plot the behaviour of the conduction and valence bands and the gap.

$ gnuplot
gnuplot> p "gw_ff.dat" u 1:2 w lp t "Valence"
gnuplot> p "gw_ff.dat" u 1:3 w lp t "Conduction"
gnuplot> p "gw_ff.dat" u 1:($3-$2) w lp t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Real axis integration, convergences with respect the used number of frequencies">
File:ff_v.png|Valence band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_c.png|Conduction band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_g.png|Gap energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
</gallery>

We can see that:

* Oscillations are still present, indicating that even more frequencies have to be considered. In general, a real-axis calculation is very demanding.
* The final result of the gap obtained up to now does not differ too much from the one obtained at the plasmon-pole level (~50 meV)

As the real-axis calculation is computational demanding, in the last years it has been developed in Yambo an alternative method called Multipole approximation (MPA). The MPA can be seen as an extension of the PPA, where you take a finite number of poles instead of just one. It has been seen that even a small number of poles grants a MPA on par with a full real axis calculation. For your reference, at [[Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)|https://www.yambo-code.eu/wiki/index.php/Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)]] there is a tutorial on the MPA method. For now, however, let us pass over to the next section.

===Secant Solver===

The real axis integration permits also to go beyond the linear expansion to solve the quasi particle equation.
The QP equation is a non-linear equation whose solution must be found using a suitable numerical algorithm.
[[File:Eqp_sec.png|none|x22px|caption]]
The mostly used, based on the linearization of the self-energy operator is the Newton method that is the one we have used up to now.
Yambo can also perform a search of the QP energies using a non-linear iterative method based on the [https://en.wikipedia.org/wiki/Secant_method Secant iterative Method].

In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite difference approximation of Newton's method.
The equation that defines the secant method is:

[[File:secant_eq.png|none|x35px|caption]]

The first two iterations of the secant method are shown in the following picture. The red curve shows the function f and the blue lines are the secants.

[[File:Secant_method.png|center|250px|caption]]

To see if there is any non-linear effect in the solution of the Dyson equation we compare the result of the calculation using the Newton solver as done before with the present case.
In order to use the secant method you need to edit one of the the previous ''gw_ffN.in'' files e.g. ''gw_ff100.in'' and substitute:

DysSolver= "g"
to
DysSolver= "s"

Repeat the calculations in the same way as before:

$ yambo -F gw_ff100.in -J ff100

Note than now the screening will ''not'' be calculated again as it has been stored in the ''ffN'' directories as can be seen in the report file:

[05] Dynamical Dielectric Matrix
================================
[RD./ff10//ndb.em1d]----
...
[RD./ff10//ndb.em1d_fragment_1]--------------
...

Comparing the output files, e.g. for the case with 100 freqs:

''o-ff100.qp'' '''Newton Solver:'''
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08708 2.91254
7 9 3.877976 1.421968 -3.417357

''o-ff100.qp_01'' '''Secant Solver:'''
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08715 2.93518
7 9 3.877976 1.401408 -3.731649

From the comparison, we see that the effect is of the order of 20 meV, of the same order of magnitude of the accuracy of the GW calculations.

==Step 3: Interpolating Band Structures==
Up to now we have checked convergence for the gap. Now we want to calculate the quasiparticle corrections across the Brillouin zone in order to visualize the entire band structure along a path connecting high symmetry points.

To do that we start by calculating the QP correction in the plasmon-pole approximation for all the k points of our sampling and for a number of bands around the gap. You can use a previous input file or generate a new one: <code> yambo -F gw_ppa_all_Bz.in -x -p p -g n </code> and set the parameters found in the previous tests:

EXXRLvcs= 40 Ry
% BndsRnXp
1 | 30 | # [Xp] Polarization function bands
%
NGsBlkXp= 3000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 40 | # [GW] G[W] bands range
%

and we calculate it for all the available kpoints by setting:
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 14| 6|11|
%

Note that as we have already calculated the screening with these parameters you can save time and use that database either by running in the previous directory by using <code> -J 30b_3Ry </code> or if you prefer to put the new databases in the new all_Bz directory you can create a new directory and copy there the screening databases:

$ mkdir all_Bz
$ cp ./30b_3Ry/ndb.pp* ./all_Bz/

and launch the calculation:

$ yambo -F gw_ppa_all_Bz.in -J all_Bz

Now we can inspect the output and see that it contains the correction for all the k points for the bands we asked:

# K-point Band Eo E-Eo Sc|Eo
#
1 6 -1.299712 -0.219100 3.788044
1 7 -1.296430 -0.241496 3.788092
1 8 -1.296420 -0.243115 3.785947
1 9 4.832399 0.952386 -3.679259
1 10 10.76416 2.09915 -4.38743
1 11 11.36167 2.48053 -3.91021
....
By plotting some of the 'o-all_Bz.qp" columns it is possible to discuss some physical properties of the hBN QPs. Using columns 3 and (3+4), ie plotting the GW energies with respect to the LDA energies we can deduce the band gap renormalization and the stretching of the conduction/valence bands:

$ gnuplot
gnuplot> p "o-all_Bz.qp" u 3:($3+$4) w p t "Eqp vs Elda"

[[File:EqpvE0.png|center|350px|caption]]

Essentially we can see that the effect of the GW self-energy is the opening of the gap and a linear stretching of the conduction/valence bands that can be estimated by performing a linear fit of the positive and negative energies (the zero is set at top of the valence band).

In order to calculate the band structure, however, we need to interpolate the values we have calculated above on a given path. In Yambo the interpolation is done by the executable <code>ypp</code> (Yambo Post Processing).
By typing:
$ ypp -h
you will recognize that in order to interpolate the bands we need to build a ypp input file using
$ ypp -s b

we can generate the input for the band interpolation:

$ypp -s b -F ypp_bands.in

and edit the ''ypp_bands.in'' file:

electrons # [R] Electrons (and holes)
bnds # [R] Bands
INTERP_mode= "BOLTZ" # Interpolation mode (NN=nearest point, BOLTZ=boltztrap aproach)
OutputAlat= 4.716000 # [a.u.] Lattice constant used for "alat" ouput format
cooIn= "rlu" # Points coordinates (in) cc/rlu/iku/alat
cooOut= "rlu"
% BANDS_bands
1 | 100 | # Number of bands
%
INTERP_Shell_Fac= 20.00000 # The bigger it is a higher number of shells is used
CIRCUIT_E_DB_path= "none" # SAVE obtained from the QE `bands` run (alternative to %BANDS_kpts)
BANDS_path= "" # BANDS path points labels (G,M,K,L...)
BANDS_steps= 10
#BANDS_built_in # Print the bands of the generating points of the circuit using the nearest internal point
%BANDS_kpts
%

We modify the following lines:
BANDS_steps=30
% BANDS_bands
6 | 11 | # Number of bands
%
%BANDS_kpts # K points of the bands circuit
0.33300 |-.66667 |0.00000 |
0.00000 |0.00000 |0.00000 |
0.50000 |-.50000 |0.00000 |
0.33300 |-.66667 |0.00000 |
0.33300 |-.66667 |0.50000 |
0.00000 |0.00000 |0.50000 |
0.50000 |-.50000 |0.50000 |
%

which means we assign 30 points in each segment, we ask to interpolate 3 occupied and 3 empty bands and we assign the following path passing from the high symmetry points: K Γ M K H A L.
Launching:
$ ypp -F ypp_bands.in
will produce the output file ''o.bands_interpolated'' containing:

#
# |k| b6 b7 b8 b9 b10 b11 kx ky kz
#
#
0.00000 -7.22092 -0.13402 -0.13395 4.67691 4.67694 10.08905 0.33300 -0.66667 0.00000
0.03725 -7.18857 -0.17190 -0.12684 4.66126 4.71050 10.12529 0.32190 -0.64445 0.00000
...

and we can plot the bands using gnuplot:
$ gnuplot
gnuplot> p "o.bands_interpolated" u 0:2 w l, "o.bands_interpolated" u 0:3 w l, ...

[[File:bands_lda.png|center|350px|caption]]

and you can recognize the index of the high symmetry point by inspecting the last three columns.
Note that up to now we have interpolated the LDA band structure. In order to plot the GW band structure, we need to tell <code>ypp</code> in the input file where the ''ndb.QP'' database is found. This is achieved by adding in the ''ypp_bands.in'' file the line:

GfnQPdb= "E < ./all_Bz/ndb.QP"

and relaunch

$ ypp -F ypp_bands.in

Now the file ''o.bands_interpolated_01'' contains the GW interpolated band structure. We can plot the LDA and GW band structure together by using the gnuplot script [[gnuplot_scripts|bands.gnu]].

<gallery mode=nolines widths=500px heights=500px perrow=2 caption="Band strcuture of bulk hBN">
File:hBN_bands.png| LDA and GW bands structure
File:hBN_bands_lit.png| LDA and GW bands structure from Ref. <ref name="Arnaud" />
</gallery>

*As expected the effect of the GW correction is to open the gap.
*Comparing the obtained band structure with the one found in the literature by Arnaud and coworkers <ref name="Arnaud"> B. Arnaud, S. Lebegue,P. Rabiller, and M. Alouani Phys, Rev. Lett. 96, 026402 (2006)</ref> we found a very nice qualitative agreement.
*Quantitatively we found a smaller gap: about 5.2 eV (indirect gap), 5.7 eV (direct gap) while in Ref.<ref name="Arnaud" /> is found 5.95 eV for the indirect gap and a minimum direct band gap of 6.47 eV. Other values are also reported in the literature depending on the used pseudopotentials, starting functional and type of self-consistency (see below).
*The present tutorial has been done with a small k point grid which is an important parameter to be checked, so convergence with respect the k point sampling has to be validated.

==Step 4: Summary of the convergence parameters==
We have calculated the band structure of hBN starting from a DFT calculation, here we summarize the main variable we have checked to achieve convergence:

* <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> # [XX] Exchange RL components
Number of G-vectors in the exchange. This number should be checked carefully. Generally a large number is needed as the QP energies show a slow convergence. The calcualtion of the exchange part is rather fast.

*<code>[[Variables#BndsRnXp|BndsRnXp]]</code> #[Xp] Polarization function bands
Number of bands in the independent response function form which the dielectric matrix is calculated. Also this paramater has to be checked carefully,together with NGsBlkXp as the two variables are interconnected

*<code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> # [Xp] Response block size
Number of G-vectors block in the dielectric constant. Also this paramater has to be checked carefully, to be checked together with BndsRnXp. A large number of bands and block can make the calculation very demanding.

*<code>[[Variables#LongDrXp|LongDrXp]] </code> # [Xp] [cc] Electric Field
Direction of the electric field for the calculation of the q=0 component of the dielectric constant e(q,w). In a bulk can be set to (1,1,1), attention must be paid for non 3D systems.

*<code>[[Variables#PPAPntXp|PPAPntXp]] </code> # [Xp] Plasmon pole imaginary energy: this is the second frequency used to fit the Godby-Needs plasmon-pole model (PPM). If results depend consistently by changing this frequency, the PPM is not adequate for your calculation and it is need to gp beyond that, e.g. Real-axis.

*<code>[[Variables#GbndRnge|GbndRnge]] </code> # [GW] G[W] bands range
Number of bands used to expand the Green's function. This number is usually larger than the number of bands used to calculated the dielectricconstant. Single quasiparticle energies converge slowly with respect GbndRnge, energy difference behave better. You can use terminator technique to mitigate the slow dependence.

*<code>[[Variables#GDamping|GDamping]] </code> # [GW] G[W] damping
Small damping in the Green's function definition, the delta
parameter. The final result shouuld not depend on that, usually set at 0.1 eV

*<code>[[Variables#dScStep|dScStep]] </code> # [GW]
Energy step to evaluate Z factors

*<code>[[Variables#DysSolver|DysSolver]] </code> # [GW] Dyson Equation solver ("n","s","g")
Parameters related to the solution of the Dyson equation, "n" Newton linearization, 's' non linear secant method

*<code>[[Variables#GTermKind|GTermKind]] </code> [GW] GW terminator
Terminator for the self-energy<ref name="BG" /> . We have seen how this spped up the convergence with respect empty bands.

*<code>[[Variables#%QPkrange |QPkrange ]] </code> # [GW] QP generalized Kpoint/Band indices
K-points and band range where you want to calculate the GW correction. The syntax is
first kpoint | last kpoint | first band | last band

==References==

GW on h-BN (standalone)

2025-05-18T13:35:19Z

Giacomo.sesti: /* Secant Solver */

This is a modified version of the full tutorial on GW computations present on the Yambo wiki. Later, you can have a look at the extended version at [[How to obtain the quasi-particle band structure of a bulk material: h-BN]]

In this tutorial you will learn how to:
* Calculate quasi-particle corrections in the Hartree-Fock approximation
* Calculate quasi-particle corrections in the GW approximation
* Choose the input parameters for a meaningful converged calculation
* Plot a band structure including quasi-particle corrections
We will use bulk hBN as an example system. Before starting, you need to obtain the appropriate tarball. See instructions on the [[Tutorials|main tutorials page]]. 
We strongly recommend that you first complete the [[First steps: a walk through from DFT to optical properties]] tutorial.



The aim of the present tutorial is to obtain quasiparticle correction to energy levels using many-body perturbation theory (MBPT). 
The general non-linear quasiparticle equation reads:
[[File:Eqp_1.png|none|x26px|caption]]
As a first step we want to evaluate the self energy Σ entering in the quasiparticle equation. In the GW approach the self-energy can be separated into two components: a static term called the exchange self-energy (Σx), and a dynamical term (energy dependent) called the correlation self-energy (Σc):
[[File:Sigma.png|none|x25px|caption]]
We will treat these two terms separately and demonstrate how to set the most important variables for calculating each term.
In practice we will compute the quasi-particle corrections to the one particle Kohn-Sham eigenvalues obtained through a DFT calculation.

The steps are the following:

==Step 1: The Exchange Self Energy or HF quasi-particle correction==

We start by evaluating the exchange Self-Energy and the corresponding Quasiparticle energies (Hartree-Fock energies).
Follow the module on '''[[Hartree Fock]]''' and then return to this tutorial ''How to obtain the quasiparticle band structure of a bulk material: h-BN''.

==Step 2: The Correlation Self-Energy and Quasiparticle Energies==
Once we have calculated the exchange part, we next turn our attention to the more demanding dynamical part. The correlation part of the self-energy in a plane wave representation reads:
[[File:Sigma_c.png|none|x50px|caption]]
In the expression for the correlation self energy, we have (1) a summation over bands, (2) an integral over the Brillouin Zone, and (3) a sum over the G vectors. In contrast with the case of Σx, the summation over bands extends over ''all'' bands (including the unoccupied ones), and so convergence tests are needed. Another important difference is that the Coulomb interaction is now screened so a fundamental ingredient is the evaluation of the dynamical dielectric matrix. The expression for the dielectric matrix, calculated at the RPA level and including local field effects, has been already treated in the [[Local fields|Local fields]] tutorial.

In the following, we will see two ways to take into account the dynamical effects. First, we will see how to set the proper parameters to obtain a model dielectric function based on a widely used approximation, which models the energy dependence of each component of the dielectric matrix with a single pole function.
Secondly, we will see how to perform calculations by evaluating the dielectric matrix on a regular grid of frequencies.

Once the correlation part of the self-energy is calculated, we will check the convergence of the different parameters with respect to some final quantity, such as the gap.

After computing the frequency dependent self-energy, we will discover that in order to solve the quasiparticle equation we will need to know its value ''at the value of the quasiparticle itself''. In the following, unless explicitly stated, we will solve the non-linear quasi-particle equation at first order, by expanding the self-energy around the Kohn-Sham eigenvalue. In this way the quasiparticle equation reads:

[[File:Eqp_2.png|none|x26px|caption]]

where the normalization factor Z is defined as:

[[File:z_fac.png|none|x40px|caption]]

===The Plasmon Pole approximation===
As stated above, the basic idea of the plasmon-pole approximation is to approximate the frequency dependence of the dielectric matrix with a single pole function of the form:
[[File:ppa.png|none|x26px|caption]]
The two parameters RGG' and ΩGG' are obtained by a fit (for each component), after having calculated the RPA dielectric matrix at two given frequencies.
Yambo calculates the dielectric matrix in the static limit ( ω=0) and at a user defined frequency called the plasmon-pole frequency (ω=iωp).
Such an approximation has the big computational advantage of calculating the dielectric matrix for only two frequencies and leads to an analytical expression for the frequency integral of the correlation self-energy.
==== Input file generation ====
Let's start by building up the input file for a GW/PPA calculation, including the calculation of the exchange self-energy. From <code>yambo -H</code> you should understand that the correct option is <code>yambo -x -p p -g n</code>:

$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo -x -p p -g n -F gw_ppa.in

Let's modify the input file in the following way:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] GoWo Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
[[Variables#EXXRLvcs|EXXRLvcs]] = 40 Ry # [XX] Exchange RL components
[[Variables#VXCRLvcs|VXCRLvcs]] = 3187 RL # [XC] XCpotential RL components
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% BndsRnXp
1 | 10 | # [Xp] Polarization function bands
%
[[Variables#NGsBlkXp|NGsBlkXp]]= 1000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
[[Variables#PPAPntXp|PPAPntXp]] = 27.21138 eV # [Xp] PPA imaginary energy
%[[Variables#GbndRnge|GbndRnge]]
1 | 40 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
7| 7| 8| 9|
%

Brief explanation of some settings:
* Similar to the Hartree Fock study, we will concentrate on the convergence of the '''direct''' gap of the system. Hence we select the last occupied (8) and first unoccupied (9) bands for k-point number 7 in the <code>[[Variables#QPkrange|QPkrange]]</code> variable.
* We also keep <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> at its converged value of 40 Ry as obtained in the '''[[Hartree Fock]]''' tutorial.
* For the moment we keep fixed the plasmon pole energy <code>[[Variables#PPAPntXp|PPAPntXp]]</code> at its default value (=1 Hartree).
* We keep fixed the direction of the electric field for the evaluation of the dielectric matrix to a non-specific value: <code>[[Variables#LongDrXp|LongDrXp]]</code>=(1,1,1).
* Later we will study convergence with respect to <code>[[Variables#GbndRnge|GbndRnge]]</code>, <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>, and <code>[[Variables#BndsRnXp|BndsRnXp]]</code>; for now just set them to the values indicated.

==== Understanding the output ====
Let's look at the typical Yambo output. Run Yambo with an appropriate <code>-J</code> flag:

$ yambo -F gw_ppa.in -J 10b_1Ry

In the standard output you can recognise the different steps of the calculations: calculation of the screening matrix (evaluation of the non interacting and interacting response), calculation of the exchange self-energy, and finally the calculation of the correlation self-energy and quasiparticle energies. Moreover information on memory usage and execution time are reported:
...
<---> [05] Dynamic Dielectric Matrix (PPA)
...
<08s> Xo@q[3] |########################################| [100%] 03s(E) 03s(X)
<08s> X@q[3] |########################################| [100%] --(E) --(X)
...
<43s> [06] Bare local and non-local Exchange-Correlation
<43s> EXS |########################################| [100%] --(E) --(X)
...
<43s> [07] Dyson equation: Newton solver
<43s> [07.01] G0W0 (W PPA)
...
<45s> G0W0 (W PPA) |########################################| [100%] --(E) --(X)
...
<45s> [07.02] QP properties and I/O
<45s> [08] Game Over & Game summary

Let's have a look at the report and output. The output file ''o-10b_1Ry.qp'' contains (for each band and k-point that we indicated in the input file) the values of the bare KS eigenvalue, its GW correction and the correlation part of the self energy:
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.411876 -0.567723 2.322443
7 9 3.877976 2.413773 -2.232241

In the header you can see the details of the calculations, for instance it reports that <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=1 Ry corresponds to 5 Gvectors:

# X G`s [used]: 5

Other information can be found in the report file ''r-10b_1Ry_em1d_ppa_HF_and_locXC_gw0'', such as the renormalization factor defined above, the value of the ''exchange'' self-energy (non-local XC) and of the DFT exchange-correlation potential (local XC):

[07.02] QP properties and I/O
=============================
Legend (energies in eV):
- B : Band - Eo : bare energy
- E : QP energy - Z : Renormalization factor
- So : Sc(Eo) - S : Sc(E)
- dSp: Sc derivative precision
- lXC: Starting Local XC (DFT)
-nlXC: Starting non-Local XC (HF)
QP [eV] @ K [7] (iku): 0.000000 -0.500000 0.000000
B=8 Eo= -0.41 E= -0.98 E-Eo= -0.56 Re(Z)=0.81 Im(Z)=-.2368E-2 nlXC=-19.13 lXC=-16.11 So= 2.322
B=9 Eo= 3.88 E= 6.29 E-Eo= 2.41 Re(Z)=0.83 Im(Z)=-.2016E-2 nlXC=-5.536 lXC=-10.67 So=-2.232

Extended information can be also found in the output activating in the input the <code>[[Variables#ExtendOut|ExtendOut]]</code> flag. This is an optional flag that can be activated by adding the <code>-V qp</code> verbosity option when building the input file. The plasmon pole screening, exchange self-energy and the quasiparticle energies are also saved in databases so that they can be reused in further runs:

$ ls ./10b_1Ry
ndb.pp ndb.pp_fragment_1 ... ndb.HF_and_locXC ndb.QP

===Convergence tests for a quasi particle calculation===

Now we can check the convergence of the different variables entering in the expression of the correlation part of the self energy. 
First, we focus on the parameter governing the ''screening matrix'' you have already seen in the [[RPA/IP]] section. In contrast to the calculation of the [[RPA/IP]] dielectric function, where you considered either the optical limit or a finite q response (EELS), here the dielectric matrix will be calculated for ''all'' q-points determined by the choice of k-point sampling.

The parameters that need to be converged can be understood by looking at expression of the dielectric matrix:
[[File:Yambo-CH5.png|none|x30px|Yambo tutorial image]]
where ''χGG''' is given by
[[File:Dyson_rpa.png|none|x40px|Yambo tutorial image]]
and χ0GG' is given by
[[File:ChiO.png|none|x45px|Yambo tutorial image]]
* <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> : The dimension of the microscopic inverse matrix, related to [[Local fields]]
* <code>[[Variables#BndsRnXp|BndsRnXp]]</code> : The sum on bands (c,v) in the independent particle χ0GG'

====Converging Screening Parameters====
Here we will check the convergence of the gap starting from the variables controlling the screening reported above: the bands employed to build the RPA response function <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and the number of blocks (G,G') of the dielectric matrix ε-1G,G' <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
In the next section we will study convergence with respect to the sum over states summation (sum over ''m'' in the Σc expression); here let's fix <code>[[Variables#GbndRnge|GbndRnge]]</code> to a reasonable value (40 Ry).

Let's build a series of input files differing by the values of bands and block sizes in χ0GG' considering <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in the range 10-50 (upper limit) and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> in the range 1000 to 5000 mRy. To do this by hand, file by file, open the ''gw_ppa.in'' file in an editor and change to:
[[Variables#NGsBlkXp|NGsBlkXp]] = '''2000 mRy'''
while leaving the rest untouched. Repeat for 3000 mRy, 4000 mRy etc. Next, for each <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> change to:
% [[Variables#BndsRnXp|BndsRnXp]]
1 | 20 | # [Xp] Polarization function bands
%
and repeat for 30, 40 and so on. Give a '''different name''' to each file: ''gw_ppa_Xb_YRy.in'' with X=10,20,30,40 and Y=1000,2000,3000,4000,5000 mRy.

This is obviously quite tedious. However, you can automate both the input construction and code execution using bash or python scripts (indeed later you will learn how to use the the yambo-python [http://www.yambo-code.org/wiki/index.php?title=GW_tutorial._Convergence_and_approximations_(BN)]tool for this task). For now, you can use the simple [[bash_scripts|generate_inputs_1.sh]] bash script to generate the required input files (after copying the script you need to type <code>$ chmod +x name_of_the_script.sh </code>).

Finally launch the calculations:

$ yambo -F gw_ppa_10b_1Ry.in -J 10b_1Ry
$ yambo -F gw_ppa_10b_2Ry.in -J 10b_2Ry
...
$ yambo -F gw_ppa_40b_5Ry.in -J 40b_5Ry

Once the jobs are terminated we can collect the quasiparticle energies for fixed <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in different files named e.g. ''10b.dat, 20b.dat'' etc. for plotting, by putting in separate columns: the energy cutoff; the size of the G blocks; the quasiparticle energy of the valence band; and that of the conduction band.
To do this e.g. for the ''10b.dat'' file you can type:
$ cat o-10b* | grep "7 * 8" | awk '{print $3+$4}'
to parse the valence band quasiparticle energies and
$ cat o-10b* | grep "7 * 9" | awk '{print $3+$4}'
for the conduction band; and put all the data in the ''10b.dat'' files. As there are many files to process you can use the [[bash_scripts|parse_qps.sh]] script to create the ''10b.dat'' file and edit the script changing the number of <code>[[Variables#BndsRnXp|BndsRnXp]]</code> for the other output files.

Once we have collected all the quasiparticle values we can plot the gap, or the valence and conduction band energies separately, as a function of the block size or energy cutoff:
$ gnuplot
gnuplot> p "10b.dat" u 1:3 w lp t " Valence BndsRnXp=10", "20b.dat" u 1:3 w lp t "Valence BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:4 w lp t " Conduction BndsRnXp=10", "20b.dat" u 1:4 w lp t "Conduction BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:($4-$3) w lp t " Gap BndsRnXp=10", "20b.dat" u 1:($4-$3) w lp t "gap BndsRnXp=20",..
or both using e.g. the [[gnuplot_scripts|ppa_gap.gnu]] gnuplot script:

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect screening parameters">
File:ppa2.png|Valence band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
File:ppa3.png|Conduction band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
</gallery>

Looking at the plot we can see that:
* The two parameters are not totally independent (see e.g. the valence quasiparticle convergence) and this is the reason why we converged them simultaneously
* The gap (energy difference) converge faster than single quasiparticle state
* The convergence criteria depends on the degree of accuracy we look for, but considering the approximations behind the calculations (plasmon-pole etc.), it is not always a good idea to enforce too strict a criteria.
* Even if not totally converged we can consider that the upper limit of <code>[[Variables#BndsRnXp|BndsRnXp]]</code>=30, and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=3Ry are reasonable parameters.

Finally notice that in the last versions of Yambo (from 4.5 etc.) there is a terminator for the dielectric constant that accelerates convergence on the number of bands,
you can active it with the <code>-V resp</code> verbosity and setting <code>XTermKind= "BG"</code>. You can compare the converge of the dielectric constant with and without terminator,
similar to what happens for the self-energy, see next section.

====Converging the sum over states in the correlation self-energy====
From now on we will keep fixed these parameters and will perform a convergence study on the sum over state summation in the correlation self-energy (Σc) <code>[[Variables#GbndRnge|GbndRnge]]</code>.

In order to use the screening previously calculated we can copy the plasmon pole parameters saved in the <code>30b_3Ry</code> directory in the <code>SAVE</code> directory. In this way the screening will be read by Yambo and not calculated again:

$ cp ./30b_3Ry/ndb.pp* ./SAVE/.

(Note: you may have to delete these files before running the BSE tutorials)

In order to use the databases we have to be sure to have the same plasmon-pole parameters in our input files.
Edit ''gw_ppa_30b_3Ry.in'' and modify <code>[[Variables#GbndRnge|GbndRnge]]</code> to order to have a number of bands in the range from 10 to 80 inside different files named ''gw_ppa_Gbnd10.in'', ''gw_ppa_Gbnd20.in'' etc. You can also run the the [[bash_scripts|generate_inputs_2.sh]] bash script to generate the required input files.

Next, launch yambo for each input:

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20
...

and as done before we can inspect the obtained quasiparticle energies:

$ grep "7 * 8" o-Gbnd* | awk '{print $4+$5}'
for the valence bands, and
$ grep "7 * 9" o-Gbnd* | awk '{print $4+$5}'
for the conduction band.

Collect the results in a text file ''Gbnd_conv.dat'' containing: the number of Gbnd, the valence energy, and the conduction energy.
Now, as done before we can plot the valence and conduction quasiparticle levels separately as well as the gap, as a function of the number of bands used in the summation:

$ gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp lt 7 t "Valence"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp lt 7 t "Conduction"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp lt 7 t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect sum over states in correlation self-energy">
File:Gbnd_val.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>.
File:Gbnd_cond.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
File:Gbnd_gap.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
</gallery>

Inspecting the plot we can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is rather slow and many bands are needed to get converged results.
* As observed above the gap (energy difference) converges faster than the single quasiparticle state energies.

====Accelerating the sum over states convergence in the correlation self-energy====
In general the convergence with respect the sum over states can be very cumbersome. Here we show how it can be mitigated by using
a technique developed by F. Bruneval and X. Gonze <ref name="BG"> F. Bruneval and X. Gonze, Physical Review B 78, 085125 (2008 )</ref>. The basic idea relies in replacing the eigenenergies of the states that are not treated explicitly by a common energy, and take into account all the states, which are not explicitly included in the calculation through the closure relation.
To apply this technique in Yambo we need to activate the optional terminator variable [[Variables#GTermKind|GTermKind]]. You can edit the input file by setting:

[[Variables#GTermKind|GTermKind]]= "BG"

for example in the file ''gw_ppa_Gbnd10.in''. Now, repeat this operation also in all the other ''gw_ppa_GbndX.in'' input files.
We can now perform once again the same calculations with the terminator activated

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10_term
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20_term
...

and collect the new results:

$ grep "7 * 8" *term.qp | awk '{print $4+$5}'
$ grep "7 * 9" *term.qp | awk '{print $4+$5}'

in a new file called ''Gbnd_conv_terminator.dat''. Now we can plot the same quantities as before by looking at the effect of having introduced the terminator.

gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp t "Valence", "Gbnd_conv_terminator.dat" u 1:2 w lp t "Valence with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp t "Conduction", "Gbnd_conv_terminator.dat" u 1:3 w lp t "Conduction with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp t "Gap", "Gbnd_conv_terminator.dat" u 1:($3-$2) w lp t "Gap with terminator"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Effect of the terminator in convergence of QP energies with respect sum over states">
File:val_t.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique <ref name="BG" />
File:cond_t.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
File:gap_t.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
</gallery>

We can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is accelerated in particular for the single quasiparticle states.
* From the plot above we can see that <code>[[Variables#GbndRange|GbndRange]]</code>=40 is sufficient to have a converged gap

===Beyond the plasmon pole approximation: a full frequency approach - real axis integration===
All the calculations performed up to now were based on the plasmon pole approximation (PPA). Now we remove this approximation by evaluating numerically the frequency integral in the expression of the correlation self-energy (Σc). To this aim we need to evaluate the screening for a number of frequencies in the interval depending on the electron-hole energy difference (energy difference between empty and occupied states) entering in the sum over states.
Let's build the input file for a full frequency calculation by simply typing:

$ yambo -F gw_ff.in -g n -p r

and we can set the variable studied up to now at their converged value:

%[[Variables#BndsRnXd|BndsRnXd]]
1 | 30 |
%
[[Variables#NGsBlkXd|NGsBlkXd]]=3000 mRy
%[[Variables#GbndRange|GbndRange]]
1 | 40 |
%
[[Variables#EXXRLvcs|EXXRLvcs]]=40000 mRy
% [[Variables#LongDrXd|LongDrXd]]
1.000000 | 1.000000 | 1.000000 | # [Xd] [cc] Electric Field
%
%QPkrange # # [GW] QP generalized Kpoint/Band indices
7|7|8|9|
%

and next vary in different files (name them ''gw_ff10.in'' etc.) the number of frequencies we evaluate for the screened coulomb potential. This is done by setting the values of [[Variables#ETStpsXd|ETStpsXd]]=10 , 50 , 100, 150, 200, 250.
You can also run the the [[bash_scripts|generate_inputs_3.sh]] bash script to generate the required input files.

Next launch yambo:
$ yambo -F gw_ff10.in -J ff10
$ yambo -F gw_ff50.in -J ff50
...

Clearly as we are evaluating the screening for a large number of frequencies these calculations will be heavier than the case above where the calculation of the screening was done for only two frequencies (zero and plasmon-pole).
As before, collect the valence and conduction bands as a function of the number of frequencies in a file called ''gw_ff.dat'' and plot the behaviour of the conduction and valence bands and the gap.

$ gnuplot
gnuplot> p "gw_ff.dat" u 1:2 w lp t "Valence"
gnuplot> p "gw_ff.dat" u 1:3 w lp t "Conduction"
gnuplot> p "gw_ff.dat" u 1:($3-$2) w lp t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Real axis integration, convergences with respect the used number of frequencies">
File:ff_v.png|Valence band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_c.png|Conduction band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_g.png|Gap energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
</gallery>

We can see that:

* Oscillations are still present, indicating that even more frequencies have to be considered. In general, a real-axis calculation is very demanding.
* The final result of the gap obtained up to now does not differ too much from the one obtained at the plasmon-pole level (~50 meV)

As the real-axis calculation is computational demanding, in the last years it has been developed in Yambo an alternative method called Multipole approximation (MPA). The MPA can be seen as an extension of the PPA, where you take a finite number of poles instead of just one. It has been seen that even a small number of poles grants a MPA on par with a full real axis calculation. For your reference, at [[Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)|https://www.yambo-code.eu/wiki/index.php/Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)]] there is a tutorial on the MPA method. For now, however, let us pass over to the next section.

===Secant Solver===

The real axis integration permits also to go beyond the linear expansion to solve the quasi particle equation.
The QP equation is a non-linear equation whose solution must be found using a suitable numerical algorithm.
[[File:Eqp_sec.png|none|x22px|caption]]
The mostly used, based on the linearization of the self-energy operator is the Newton method that is the one we have used up to now.
Yambo can also perform a search of the QP energies using a non-linear iterative method based on the [https://en.wikipedia.org/wiki/Secant_method Secant iterative Method].

In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite difference approximation of Newton's method.
The equation that defines the secant method is:

[[File:secant_eq.png|none|x35px|caption]]

The first two iterations of the secant method are shown in the following picture. The red curve shows the function f and the blue lines are the secants.

[[File:Secant_method.png|center|250px|caption]]

To see if there is any non-linear effect in the solution of the Dyson equation we compare the result of the calculation using the Newton solver as done before with the present case.
In order to use the secant method you need to edit one of the the previous ''gw_ffN.in'' files e.g. ''gw_ff100.in'' and substitute:

DysSolver= "g"
to
DysSolver= "s"

Repeat the calculations in the same way as before:

$ yambo -F gw_ff100.in -J ff100

Note than now the screening will ''not'' be calculated again as it has been stored in the ''ffN'' directories as can be seen in the report file:

[05] Dynamical Dielectric Matrix
================================
[RD./ff10//ndb.em1d]----
...
[RD./ff10//ndb.em1d_fragment_1]--------------
...

Comparing the output files, e.g. for the case with 100 freqs:

''o-ff100.qp'' '''Newton Solver:'''
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08708 2.91254
7 9 3.877976 1.421968 -3.417357

''o-ff100.qp_01'' '''Secant Solver:'''
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08715 2.93518
7 9 3.877976 1.401408 -3.731649

From the comparison, we see that the effect is of the order of 20 meV, of the same order of magnitude of the accuracy of the GW calculations.

==Step 3: Interpolating Band Structures==
Up to now we have checked convergence for the gap. Now we want to calculate the quasiparticle corrections across the Brillouin zone in order to visualize the entire band structure along a path connecting high symmetry points.

To do that we start by calculating the QP correction in the plasmon-pole approximation for all the k points of our sampling and for a number of bands around the gap. You can use a previous input file or generate a new one: <code> yambo -F gw_ppa_all_Bz.in -x -p p -g n </code> and set the parameters found in the previous tests:

EXXRLvcs= 40 Ry
% BndsRnXp
1 | 30 | # [Xp] Polarization function bands
%
NGsBlkXp= 3000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 40 | # [GW] G[W] bands range
%

and we calculate it for all the available kpoints by setting:
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 14| 6|11|
%

Note that as we have already calculated the screening with these parameters you can save time and use that database either by running in the previous directory by using <code> -J 30b_3Ry </code> or if you prefer to put the new databases in the new all_Bz directory you can create a new directory and copy there the screening databases:

$ mkdir all_Bz
$ cp ./30b_3Ry/ndb.pp* ./all_Bz/

and launch the calculation:

$ yambo -F gw_ppa_all_Bz.in -J all_Bz

Now we can inspect the output and see that it contains the correction for all the k points for the bands we asked:

# K-point Band Eo E-Eo Sc|Eo
#
1 6 -1.299712 -0.219100 3.788044
1 7 -1.296430 -0.241496 3.788092
1 8 -1.296420 -0.243115 3.785947
1 9 4.832399 0.952386 -3.679259
1 10 10.76416 2.09915 -4.38743
1 11 11.36167 2.48053 -3.91021
....
By plotting some of the 'o-all_Bz.qp" columns it is possible to discuss some physical properties of the hBN QPs. Using columns 3 and (3+4), ie plotting the GW energies with respect to the LDA energies we can deduce the band gap renormalization and the stretching of the conduction/valence bands:

$ gnuplot
gnuplot> p "o-all_Bz.qp" u 3:($3+$4) w p t "Eqp vs Elda"

[[File:EqpvE0.png|center|350px|caption]]

Essentially we can see that the effect of the GW self-energy is the opening of the gap and a linear stretching of the conduction/valence bands that can be estimated by performing a linear fit of the positive and negative energies (the zero is set at top of the valence band).

In order to calculate the band structure, however, we need to interpolate the values we have calculated above on a given path. In Yambo the interpolation is done by the executable <code>ypp</code> (Yambo Post Processing).
By typing:
$ ypp -h
you will recognize that in order to interpolate the bands we need to build a ypp input file using
$ ypp -s b

we can generate the input for the band interpolation:

$ypp -s b -F ypp_bands.in

and edit the ''ypp_bands.in'' file:

electrons # [R] Electrons (and holes)
bnds # [R] Bands
INTERP_mode= "BOLTZ" # Interpolation mode (NN=nearest point, BOLTZ=boltztrap aproach)
OutputAlat= 4.716000 # [a.u.] Lattice constant used for "alat" ouput format
cooIn= "rlu" # Points coordinates (in) cc/rlu/iku/alat
cooOut= "rlu"
% BANDS_bands
1 | 100 | # Number of bands
%
INTERP_Shell_Fac= 20.00000 # The bigger it is a higher number of shells is used
CIRCUIT_E_DB_path= "none" # SAVE obtained from the QE `bands` run (alternative to %BANDS_kpts)
BANDS_path= "" # BANDS path points labels (G,M,K,L...)
BANDS_steps= 10
#BANDS_built_in # Print the bands of the generating points of the circuit using the nearest internal point
%BANDS_kpts
%

We modify the following lines:
BANDS_steps=30
% BANDS_bands
6 | 11 | # Number of bands
%
%BANDS_kpts # K points of the bands circuit
0.33300 |-.66667 |0.00000 |
0.00000 |0.00000 |0.00000 |
0.50000 |-.50000 |0.00000 |
0.33300 |-.66667 |0.00000 |
0.33300 |-.66667 |0.50000 |
0.00000 |0.00000 |0.50000 |
0.50000 |-.50000 |0.50000 |
%

which means we assign 30 points in each segment, we ask to interpolate 3 occupied and 3 empty bands and we assign the following path passing from the high symmetry points: K Γ M K H A L.
Launching:
$ ypp -F ypp_bands.in
will produce the output file ''o.bands_interpolated'' containing:

#
# |k| b6 b7 b8 b9 b10 b11 kx ky kz
#
#
0.00000 -7.22092 -0.13402 -0.13395 4.67691 4.67694 10.08905 0.33300 -0.66667 0.00000
0.03725 -7.18857 -0.17190 -0.12684 4.66126 4.71050 10.12529 0.32190 -0.64445 0.00000
...

and we can plot the bands using gnuplot:
$ gnuplot
gnuplot> p "o.bands_interpolated" u 0:2 w l, "o.bands_interpolated" u 0:3 w l, ...

[[File:bands_lda.png|center|350px|caption]]

and you can recognize the index of the high symmetry point by inspecting the last three columns.
Note that up to now we have interpolated the LDA band structure. In order to plot the GW band structure, we need to tell <code>ypp</code> in the input file where the ''ndb.QP'' database is found. This is achieved by adding in the ''ypp_bands.in'' file the line:

GfnQPdb= "E < ./all_Bz/ndb.QP"

and relaunch

$ ypp -F ypp_bands.in

Now the file ''o.bands_interpolated_01'' contains the GW interpolated band structure. We can plot the LDA and GW band structure together by using the gnuplot script [[gnuplot_scripts|bands.gnu]].

<gallery mode=nolines widths=500px heights=500px perrow=2 caption="Band strcuture of bulk hBN">
File:hBN_bands.png| LDA and GW bands structure
File:hBN_bands_lit.png| LDA and GW bands structure from Ref. <ref name="Arnaud" />
</gallery>

*As expected the effect of the GW correction is to open the gap.
*Comparing the obtained band structure with the one found in the literature by Arnaud and coworkers <ref name="Arnaud"> B. Arnaud, S. Lebegue,P. Rabiller, and M. Alouani Phys, Rev. Lett. 96, 026402 (2006)</ref> we found a very nice qualitative agreement.
*Quantitatively we found a smaller gap: about 5.2 eV (indirect gap), 5.7 eV (direct gap) while in Ref.<ref name="Arnaud" /> is found 5.95 eV for the indirect gap and a minimum direct band gap of 6.47 eV. Other values are also reported in the literature depending on the used pseudopotentials, starting functional and type of self-consistency (see below).
*The present tutorial has been done with a small k point grid which is an important parameter to be checked, so convergence with respect the k point sampling has to be validated.

==Step 4: Summary of the convergence parameters==
We have calculated the band structure of hBN starting from a DFT calculation, here we summarize the main variable we have checked to achieve convergence:

* <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> # [XX] Exchange RL components
Number of G-vectors in the exchange. This number should be checked carefully. Generally a large number is needed as the QP energies show a slow convergence. The calcualtion of the exchange part is rather fast.

*<code>[[Variables#BndsRnXp|BndsRnXp]]</code> #[Xp] Polarization function bands
Number of bands in the independent response function form which the dielectric matrix is calculated. Also this paramater has to be checked carefully,together with NGsBlkXp as the two variables are interconnected

*<code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> # [Xp] Response block size
Number of G-vectors block in the dielectric constant. Also this paramater has to be checked carefully, to be checked together with BndsRnXp. A large number of bands and block can make the calculation very demanding.

*<code>[[Variables#LongDrXp|LongDrXp]] </code> # [Xp] [cc] Electric Field
Direction of the electric field for the calculation of the q=0 component of the dielectric constant e(q,w). In a bulk can be set to (1,1,1), attention must be paid for non 3D systems.

*<code>[[Variables#PPAPntXp|PPAPntXp]] </code> # [Xp] Plasmon pole imaginary energy: this is the second frequency used to fit the Godby-Needs plasmon-pole model (PPM). If results depend consistently by changing this frequency, the PPM is not adequate for your calculation and it is need to gp beyond that, e.g. Real-axis.

*<code>[[Variables#GbndRnge|GbndRnge]] </code> # [GW] G[W] bands range
Number of bands used to expand the Green's function. This number is usually larger than the number of bands used to calculated the dielectricconstant. Single quasiparticle energies converge slowly with respect GbndRnge, energy difference behave better. You can use terminator technique to mitigate the slow dependence.

*<code>[[Variables#GDamping|GDamping]] </code> # [GW] G[W] damping
Small damping in the Green's function definition, the delta
parameter. The final result shouuld not depend on that, usually set at 0.1 eV

*<code>[[Variables#dScStep|dScStep]] </code> # [GW]
Energy step to evaluate Z factors

*<code>[[Variables#DysSolver|DysSolver]] </code> # [GW] Dyson Equation solver ("n","s","g")
Parameters related to the solution of the Dyson equation, "n" Newton linearization, 's' non linear secant method

*<code>[[Variables#GTermKind|GTermKind]] </code> [GW] GW terminator
Terminator for the self-energy<ref name="BG" /> . We have seen how this spped up the convergence with respect empty bands.

*<code>[[Variables#%QPkrange |QPkrange ]] </code> # [GW] QP generalized Kpoint/Band indices
K-points and band range where you want to calculate the GW correction. The syntax is
first kpoint | last kpoint | first band | last band

==References==

GW on h-BN (standalone)

2025-05-18T13:34:07Z

Giacomo.sesti: /* Step 3: Interpolating Band Structures */

This is a modified version of the full tutorial on GW computations present on the Yambo wiki. Later, you can have a look at the extended version at [[How to obtain the quasi-particle band structure of a bulk material: h-BN]]

In this tutorial you will learn how to:
* Calculate quasi-particle corrections in the Hartree-Fock approximation
* Calculate quasi-particle corrections in the GW approximation
* Choose the input parameters for a meaningful converged calculation
* Plot a band structure including quasi-particle corrections
We will use bulk hBN as an example system. Before starting, you need to obtain the appropriate tarball. See instructions on the [[Tutorials|main tutorials page]]. 
We strongly recommend that you first complete the [[First steps: a walk through from DFT to optical properties]] tutorial.



The aim of the present tutorial is to obtain quasiparticle correction to energy levels using many-body perturbation theory (MBPT). 
The general non-linear quasiparticle equation reads:
[[File:Eqp_1.png|none|x26px|caption]]
As a first step we want to evaluate the self energy Σ entering in the quasiparticle equation. In the GW approach the self-energy can be separated into two components: a static term called the exchange self-energy (Σx), and a dynamical term (energy dependent) called the correlation self-energy (Σc):
[[File:Sigma.png|none|x25px|caption]]
We will treat these two terms separately and demonstrate how to set the most important variables for calculating each term.
In practice we will compute the quasi-particle corrections to the one particle Kohn-Sham eigenvalues obtained through a DFT calculation.

The steps are the following:

==Step 1: The Exchange Self Energy or HF quasi-particle correction==

We start by evaluating the exchange Self-Energy and the corresponding Quasiparticle energies (Hartree-Fock energies).
Follow the module on '''[[Hartree Fock]]''' and then return to this tutorial ''How to obtain the quasiparticle band structure of a bulk material: h-BN''.

==Step 2: The Correlation Self-Energy and Quasiparticle Energies==
Once we have calculated the exchange part, we next turn our attention to the more demanding dynamical part. The correlation part of the self-energy in a plane wave representation reads:
[[File:Sigma_c.png|none|x50px|caption]]
In the expression for the correlation self energy, we have (1) a summation over bands, (2) an integral over the Brillouin Zone, and (3) a sum over the G vectors. In contrast with the case of Σx, the summation over bands extends over ''all'' bands (including the unoccupied ones), and so convergence tests are needed. Another important difference is that the Coulomb interaction is now screened so a fundamental ingredient is the evaluation of the dynamical dielectric matrix. The expression for the dielectric matrix, calculated at the RPA level and including local field effects, has been already treated in the [[Local fields|Local fields]] tutorial.

In the following, we will see two ways to take into account the dynamical effects. First, we will see how to set the proper parameters to obtain a model dielectric function based on a widely used approximation, which models the energy dependence of each component of the dielectric matrix with a single pole function.
Secondly, we will see how to perform calculations by evaluating the dielectric matrix on a regular grid of frequencies.

Once the correlation part of the self-energy is calculated, we will check the convergence of the different parameters with respect to some final quantity, such as the gap.

After computing the frequency dependent self-energy, we will discover that in order to solve the quasiparticle equation we will need to know its value ''at the value of the quasiparticle itself''. In the following, unless explicitly stated, we will solve the non-linear quasi-particle equation at first order, by expanding the self-energy around the Kohn-Sham eigenvalue. In this way the quasiparticle equation reads:

[[File:Eqp_2.png|none|x26px|caption]]

where the normalization factor Z is defined as:

[[File:z_fac.png|none|x40px|caption]]

===The Plasmon Pole approximation===
As stated above, the basic idea of the plasmon-pole approximation is to approximate the frequency dependence of the dielectric matrix with a single pole function of the form:
[[File:ppa.png|none|x26px|caption]]
The two parameters RGG' and ΩGG' are obtained by a fit (for each component), after having calculated the RPA dielectric matrix at two given frequencies.
Yambo calculates the dielectric matrix in the static limit ( ω=0) and at a user defined frequency called the plasmon-pole frequency (ω=iωp).
Such an approximation has the big computational advantage of calculating the dielectric matrix for only two frequencies and leads to an analytical expression for the frequency integral of the correlation self-energy.
==== Input file generation ====
Let's start by building up the input file for a GW/PPA calculation, including the calculation of the exchange self-energy. From <code>yambo -H</code> you should understand that the correct option is <code>yambo -x -p p -g n</code>:

$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo -x -p p -g n -F gw_ppa.in

Let's modify the input file in the following way:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] GoWo Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
[[Variables#EXXRLvcs|EXXRLvcs]] = 40 Ry # [XX] Exchange RL components
[[Variables#VXCRLvcs|VXCRLvcs]] = 3187 RL # [XC] XCpotential RL components
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% BndsRnXp
1 | 10 | # [Xp] Polarization function bands
%
[[Variables#NGsBlkXp|NGsBlkXp]]= 1000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
[[Variables#PPAPntXp|PPAPntXp]] = 27.21138 eV # [Xp] PPA imaginary energy
%[[Variables#GbndRnge|GbndRnge]]
1 | 40 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
7| 7| 8| 9|
%

Brief explanation of some settings:
* Similar to the Hartree Fock study, we will concentrate on the convergence of the '''direct''' gap of the system. Hence we select the last occupied (8) and first unoccupied (9) bands for k-point number 7 in the <code>[[Variables#QPkrange|QPkrange]]</code> variable.
* We also keep <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> at its converged value of 40 Ry as obtained in the '''[[Hartree Fock]]''' tutorial.
* For the moment we keep fixed the plasmon pole energy <code>[[Variables#PPAPntXp|PPAPntXp]]</code> at its default value (=1 Hartree).
* We keep fixed the direction of the electric field for the evaluation of the dielectric matrix to a non-specific value: <code>[[Variables#LongDrXp|LongDrXp]]</code>=(1,1,1).
* Later we will study convergence with respect to <code>[[Variables#GbndRnge|GbndRnge]]</code>, <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>, and <code>[[Variables#BndsRnXp|BndsRnXp]]</code>; for now just set them to the values indicated.

==== Understanding the output ====
Let's look at the typical Yambo output. Run Yambo with an appropriate <code>-J</code> flag:

$ yambo -F gw_ppa.in -J 10b_1Ry

In the standard output you can recognise the different steps of the calculations: calculation of the screening matrix (evaluation of the non interacting and interacting response), calculation of the exchange self-energy, and finally the calculation of the correlation self-energy and quasiparticle energies. Moreover information on memory usage and execution time are reported:
...
<---> [05] Dynamic Dielectric Matrix (PPA)
...
<08s> Xo@q[3] |########################################| [100%] 03s(E) 03s(X)
<08s> X@q[3] |########################################| [100%] --(E) --(X)
...
<43s> [06] Bare local and non-local Exchange-Correlation
<43s> EXS |########################################| [100%] --(E) --(X)
...
<43s> [07] Dyson equation: Newton solver
<43s> [07.01] G0W0 (W PPA)
...
<45s> G0W0 (W PPA) |########################################| [100%] --(E) --(X)
...
<45s> [07.02] QP properties and I/O
<45s> [08] Game Over & Game summary

Let's have a look at the report and output. The output file ''o-10b_1Ry.qp'' contains (for each band and k-point that we indicated in the input file) the values of the bare KS eigenvalue, its GW correction and the correlation part of the self energy:
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.411876 -0.567723 2.322443
7 9 3.877976 2.413773 -2.232241

In the header you can see the details of the calculations, for instance it reports that <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=1 Ry corresponds to 5 Gvectors:

# X G`s [used]: 5

Other information can be found in the report file ''r-10b_1Ry_em1d_ppa_HF_and_locXC_gw0'', such as the renormalization factor defined above, the value of the ''exchange'' self-energy (non-local XC) and of the DFT exchange-correlation potential (local XC):

[07.02] QP properties and I/O
=============================
Legend (energies in eV):
- B : Band - Eo : bare energy
- E : QP energy - Z : Renormalization factor
- So : Sc(Eo) - S : Sc(E)
- dSp: Sc derivative precision
- lXC: Starting Local XC (DFT)
-nlXC: Starting non-Local XC (HF)
QP [eV] @ K [7] (iku): 0.000000 -0.500000 0.000000
B=8 Eo= -0.41 E= -0.98 E-Eo= -0.56 Re(Z)=0.81 Im(Z)=-.2368E-2 nlXC=-19.13 lXC=-16.11 So= 2.322
B=9 Eo= 3.88 E= 6.29 E-Eo= 2.41 Re(Z)=0.83 Im(Z)=-.2016E-2 nlXC=-5.536 lXC=-10.67 So=-2.232

Extended information can be also found in the output activating in the input the <code>[[Variables#ExtendOut|ExtendOut]]</code> flag. This is an optional flag that can be activated by adding the <code>-V qp</code> verbosity option when building the input file. The plasmon pole screening, exchange self-energy and the quasiparticle energies are also saved in databases so that they can be reused in further runs:

$ ls ./10b_1Ry
ndb.pp ndb.pp_fragment_1 ... ndb.HF_and_locXC ndb.QP

===Convergence tests for a quasi particle calculation===

Now we can check the convergence of the different variables entering in the expression of the correlation part of the self energy. 
First, we focus on the parameter governing the ''screening matrix'' you have already seen in the [[RPA/IP]] section. In contrast to the calculation of the [[RPA/IP]] dielectric function, where you considered either the optical limit or a finite q response (EELS), here the dielectric matrix will be calculated for ''all'' q-points determined by the choice of k-point sampling.

The parameters that need to be converged can be understood by looking at expression of the dielectric matrix:
[[File:Yambo-CH5.png|none|x30px|Yambo tutorial image]]
where ''χGG''' is given by
[[File:Dyson_rpa.png|none|x40px|Yambo tutorial image]]
and χ0GG' is given by
[[File:ChiO.png|none|x45px|Yambo tutorial image]]
* <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> : The dimension of the microscopic inverse matrix, related to [[Local fields]]
* <code>[[Variables#BndsRnXp|BndsRnXp]]</code> : The sum on bands (c,v) in the independent particle χ0GG'

====Converging Screening Parameters====
Here we will check the convergence of the gap starting from the variables controlling the screening reported above: the bands employed to build the RPA response function <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and the number of blocks (G,G') of the dielectric matrix ε-1G,G' <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
In the next section we will study convergence with respect to the sum over states summation (sum over ''m'' in the Σc expression); here let's fix <code>[[Variables#GbndRnge|GbndRnge]]</code> to a reasonable value (40 Ry).

Let's build a series of input files differing by the values of bands and block sizes in χ0GG' considering <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in the range 10-50 (upper limit) and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> in the range 1000 to 5000 mRy. To do this by hand, file by file, open the ''gw_ppa.in'' file in an editor and change to:
[[Variables#NGsBlkXp|NGsBlkXp]] = '''2000 mRy'''
while leaving the rest untouched. Repeat for 3000 mRy, 4000 mRy etc. Next, for each <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> change to:
% [[Variables#BndsRnXp|BndsRnXp]]
1 | 20 | # [Xp] Polarization function bands
%
and repeat for 30, 40 and so on. Give a '''different name''' to each file: ''gw_ppa_Xb_YRy.in'' with X=10,20,30,40 and Y=1000,2000,3000,4000,5000 mRy.

This is obviously quite tedious. However, you can automate both the input construction and code execution using bash or python scripts (indeed later you will learn how to use the the yambo-python [http://www.yambo-code.org/wiki/index.php?title=GW_tutorial._Convergence_and_approximations_(BN)]tool for this task). For now, you can use the simple [[bash_scripts|generate_inputs_1.sh]] bash script to generate the required input files (after copying the script you need to type <code>$ chmod +x name_of_the_script.sh </code>).

Finally launch the calculations:

$ yambo -F gw_ppa_10b_1Ry.in -J 10b_1Ry
$ yambo -F gw_ppa_10b_2Ry.in -J 10b_2Ry
...
$ yambo -F gw_ppa_40b_5Ry.in -J 40b_5Ry

Once the jobs are terminated we can collect the quasiparticle energies for fixed <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in different files named e.g. ''10b.dat, 20b.dat'' etc. for plotting, by putting in separate columns: the energy cutoff; the size of the G blocks; the quasiparticle energy of the valence band; and that of the conduction band.
To do this e.g. for the ''10b.dat'' file you can type:
$ cat o-10b* | grep "7 * 8" | awk '{print $3+$4}'
to parse the valence band quasiparticle energies and
$ cat o-10b* | grep "7 * 9" | awk '{print $3+$4}'
for the conduction band; and put all the data in the ''10b.dat'' files. As there are many files to process you can use the [[bash_scripts|parse_qps.sh]] script to create the ''10b.dat'' file and edit the script changing the number of <code>[[Variables#BndsRnXp|BndsRnXp]]</code> for the other output files.

Once we have collected all the quasiparticle values we can plot the gap, or the valence and conduction band energies separately, as a function of the block size or energy cutoff:
$ gnuplot
gnuplot> p "10b.dat" u 1:3 w lp t " Valence BndsRnXp=10", "20b.dat" u 1:3 w lp t "Valence BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:4 w lp t " Conduction BndsRnXp=10", "20b.dat" u 1:4 w lp t "Conduction BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:($4-$3) w lp t " Gap BndsRnXp=10", "20b.dat" u 1:($4-$3) w lp t "gap BndsRnXp=20",..
or both using e.g. the [[gnuplot_scripts|ppa_gap.gnu]] gnuplot script:

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect screening parameters">
File:ppa2.png|Valence band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
File:ppa3.png|Conduction band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
</gallery>

Looking at the plot we can see that:
* The two parameters are not totally independent (see e.g. the valence quasiparticle convergence) and this is the reason why we converged them simultaneously
* The gap (energy difference) converge faster than single quasiparticle state
* The convergence criteria depends on the degree of accuracy we look for, but considering the approximations behind the calculations (plasmon-pole etc.), it is not always a good idea to enforce too strict a criteria.
* Even if not totally converged we can consider that the upper limit of <code>[[Variables#BndsRnXp|BndsRnXp]]</code>=30, and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=3Ry are reasonable parameters.

Finally notice that in the last versions of Yambo (from 4.5 etc.) there is a terminator for the dielectric constant that accelerates convergence on the number of bands,
you can active it with the <code>-V resp</code> verbosity and setting <code>XTermKind= "BG"</code>. You can compare the converge of the dielectric constant with and without terminator,
similar to what happens for the self-energy, see next section.

====Converging the sum over states in the correlation self-energy====
From now on we will keep fixed these parameters and will perform a convergence study on the sum over state summation in the correlation self-energy (Σc) <code>[[Variables#GbndRnge|GbndRnge]]</code>.

In order to use the screening previously calculated we can copy the plasmon pole parameters saved in the <code>30b_3Ry</code> directory in the <code>SAVE</code> directory. In this way the screening will be read by Yambo and not calculated again:

$ cp ./30b_3Ry/ndb.pp* ./SAVE/.

(Note: you may have to delete these files before running the BSE tutorials)

In order to use the databases we have to be sure to have the same plasmon-pole parameters in our input files.
Edit ''gw_ppa_30b_3Ry.in'' and modify <code>[[Variables#GbndRnge|GbndRnge]]</code> to order to have a number of bands in the range from 10 to 80 inside different files named ''gw_ppa_Gbnd10.in'', ''gw_ppa_Gbnd20.in'' etc. You can also run the the [[bash_scripts|generate_inputs_2.sh]] bash script to generate the required input files.

Next, launch yambo for each input:

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20
...

and as done before we can inspect the obtained quasiparticle energies:

$ grep "7 * 8" o-Gbnd* | awk '{print $4+$5}'
for the valence bands, and
$ grep "7 * 9" o-Gbnd* | awk '{print $4+$5}'
for the conduction band.

Collect the results in a text file ''Gbnd_conv.dat'' containing: the number of Gbnd, the valence energy, and the conduction energy.
Now, as done before we can plot the valence and conduction quasiparticle levels separately as well as the gap, as a function of the number of bands used in the summation:

$ gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp lt 7 t "Valence"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp lt 7 t "Conduction"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp lt 7 t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect sum over states in correlation self-energy">
File:Gbnd_val.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>.
File:Gbnd_cond.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
File:Gbnd_gap.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
</gallery>

Inspecting the plot we can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is rather slow and many bands are needed to get converged results.
* As observed above the gap (energy difference) converges faster than the single quasiparticle state energies.

====Accelerating the sum over states convergence in the correlation self-energy====
In general the convergence with respect the sum over states can be very cumbersome. Here we show how it can be mitigated by using
a technique developed by F. Bruneval and X. Gonze <ref name="BG"> F. Bruneval and X. Gonze, Physical Review B 78, 085125 (2008 )</ref>. The basic idea relies in replacing the eigenenergies of the states that are not treated explicitly by a common energy, and take into account all the states, which are not explicitly included in the calculation through the closure relation.
To apply this technique in Yambo we need to activate the optional terminator variable [[Variables#GTermKind|GTermKind]]. You can edit the input file by setting:

[[Variables#GTermKind|GTermKind]]= "BG"

for example in the file ''gw_ppa_Gbnd10.in''. Now, repeat this operation also in all the other ''gw_ppa_GbndX.in'' input files.
We can now perform once again the same calculations with the terminator activated

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10_term
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20_term
...

and collect the new results:

$ grep "7 * 8" *term.qp | awk '{print $4+$5}'
$ grep "7 * 9" *term.qp | awk '{print $4+$5}'

in a new file called ''Gbnd_conv_terminator.dat''. Now we can plot the same quantities as before by looking at the effect of having introduced the terminator.

gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp t "Valence", "Gbnd_conv_terminator.dat" u 1:2 w lp t "Valence with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp t "Conduction", "Gbnd_conv_terminator.dat" u 1:3 w lp t "Conduction with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp t "Gap", "Gbnd_conv_terminator.dat" u 1:($3-$2) w lp t "Gap with terminator"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Effect of the terminator in convergence of QP energies with respect sum over states">
File:val_t.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique <ref name="BG" />
File:cond_t.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
File:gap_t.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
</gallery>

We can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is accelerated in particular for the single quasiparticle states.
* From the plot above we can see that <code>[[Variables#GbndRange|GbndRange]]</code>=40 is sufficient to have a converged gap

===Beyond the plasmon pole approximation: a full frequency approach - real axis integration===
All the calculations performed up to now were based on the plasmon pole approximation (PPA). Now we remove this approximation by evaluating numerically the frequency integral in the expression of the correlation self-energy (Σc). To this aim we need to evaluate the screening for a number of frequencies in the interval depending on the electron-hole energy difference (energy difference between empty and occupied states) entering in the sum over states.
Let's build the input file for a full frequency calculation by simply typing:

$ yambo -F gw_ff.in -g n -p r

and we can set the variable studied up to now at their converged value:

%[[Variables#BndsRnXd|BndsRnXd]]
1 | 30 |
%
[[Variables#NGsBlkXd|NGsBlkXd]]=3000 mRy
%[[Variables#GbndRange|GbndRange]]
1 | 40 |
%
[[Variables#EXXRLvcs|EXXRLvcs]]=40000 mRy
% [[Variables#LongDrXd|LongDrXd]]
1.000000 | 1.000000 | 1.000000 | # [Xd] [cc] Electric Field
%
%QPkrange # # [GW] QP generalized Kpoint/Band indices
7|7|8|9|
%

and next vary in different files (name them ''gw_ff10.in'' etc.) the number of frequencies we evaluate for the screened coulomb potential. This is done by setting the values of [[Variables#ETStpsXd|ETStpsXd]]=10 , 50 , 100, 150, 200, 250.
You can also run the the [[bash_scripts|generate_inputs_3.sh]] bash script to generate the required input files.

Next launch yambo:
$ yambo -F gw_ff10.in -J ff10
$ yambo -F gw_ff50.in -J ff50
...

Clearly as we are evaluating the screening for a large number of frequencies these calculations will be heavier than the case above where the calculation of the screening was done for only two frequencies (zero and plasmon-pole).
As before, collect the valence and conduction bands as a function of the number of frequencies in a file called ''gw_ff.dat'' and plot the behaviour of the conduction and valence bands and the gap.

$ gnuplot
gnuplot> p "gw_ff.dat" u 1:2 w lp t "Valence"
gnuplot> p "gw_ff.dat" u 1:3 w lp t "Conduction"
gnuplot> p "gw_ff.dat" u 1:($3-$2) w lp t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Real axis integration, convergences with respect the used number of frequencies">
File:ff_v.png|Valence band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_c.png|Conduction band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_g.png|Gap energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
</gallery>

We can see that:

* Oscillations are still present, indicating that even more frequencies have to be considered. In general, a real-axis calculation is very demanding.
* The final result of the gap obtained up to now does not differ too much from the one obtained at the plasmon-pole level (~50 meV)

As the real-axis calculation is computational demanding, in the last years it has been developed in Yambo an alternative method called Multipole approximation (MPA). The MPA can be seen as an extension of the PPA, where you take a finite number of poles instead of just one. It has been seen that even a small number of poles grants a MPA on par with a full real axis calculation. For your reference, at [[Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)|https://www.yambo-code.eu/wiki/index.php/Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)]] there is a tutorial on the MPA method. For now, however, let us pass over to the next section.

===Secant Solver===

The real axis integration permits also to go beyond the linear expansion to solve the quasi particle equation.
The QP equation is a non-linear equation whose solution must be found using a suitable numerical algorithm.
[[File:Eqp_sec.png|none|x22px|caption]]
The mostly used, based on the linearization of the self-energy operator is the Newton method that is the one we have used up to now.
Yambo can also perform a search of the QP energies using a non-linear iterative method based on the [https://en.wikipedia.org/wiki/Secant_method Secant iterative Method].

In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite difference approximation of Newton's method.
The equation that defines the secant method is:

[[File:secant_eq.png|none|x35px|caption]]

The first two iterations of the secant method are shown in the following picture. The red curve shows the function f and the blue lines are the secants.

[[File:Secant_method.png|center|250px|caption]]

To see if there is any non-linear effect in the solution of the Dyson equation we compare the result of the calculation using the Newton solver as done before with the present case.
In order to use the secant method you need to edit one of the the previous ''gw_ffN.in'' files e.g. ''gw_ff100.in'' and substitute:

DysSolver= "g"
to
DysSolver= "s"

Repeat the calculations in the same way as before:

$ yambo -F gw_ff100.in -J ff100

Note than now the screening will ''not'' be calculated again as it has been stored in the ''ffN'' directories as can be seen in the report file:

[05] Dynamical Dielectric Matrix
================================
[RD./ff10//ndb.em1d]----
...
[RD./ff10//ndb.em1d_fragment_1]--------------
...

Comparing the output files, e.g. for the case with 100 freqs:

''o-ff100.qp'' '''Newton Solver:'''
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08708 2.91254
7 9 3.877976 1.421968 -3.417357

''o-ff100.qp_01'' '''Secant Solver:'''
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08715 2.93518
7 9 3.877976 1.401408 -3.731649

From the comparison, we see that the effect is of the order of 20 meV, of the same order of magnitude of the accuracy of the GW calculations.

==Step 3: Interpolating Band Structures==
Up to now we have checked convergence for the gap. Now we want to calculate the quasiparticle corrections across the Brillouin zone in order to visualize the entire band structure along a path connecting high symmetry points.

To do that we start by calculating the QP correction in the plasmon-pole approximation for all the k points of our sampling and for a number of bands around the gap. You can use a previous input file or generate a new one: <code> yambo -F gw_ppa_all_Bz.in -x -p p -g n </code> and set the parameters found in the previous tests:

EXXRLvcs= 40 Ry
% BndsRnXp
1 | 30 | # [Xp] Polarization function bands
%
NGsBlkXp= 3000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 40 | # [GW] G[W] bands range
%

and we calculate it for all the available kpoints by setting:
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 14| 6|11|
%

Note that as we have already calculated the screening with these parameters you can save time and use that database either by running in the previous directory by using <code> -J 30b_3Ry </code> or if you prefer to put the new databases in the new all_Bz directory you can create a new directory and copy there the screening databases:

$ mkdir all_Bz
$ cp ./30b_3Ry/ndb.pp* ./all_Bz/

and launch the calculation:

$ yambo -F gw_ppa_all_Bz.in -J all_Bz

Now we can inspect the output and see that it contains the correction for all the k points for the bands we asked:

# K-point Band Eo E-Eo Sc|Eo
#
1 6 -1.299712 -0.219100 3.788044
1 7 -1.296430 -0.241496 3.788092
1 8 -1.296420 -0.243115 3.785947
1 9 4.832399 0.952386 -3.679259
1 10 10.76416 2.09915 -4.38743
1 11 11.36167 2.48053 -3.91021
....
By plotting some of the 'o-all_Bz.qp" columns it is possible to discuss some physical properties of the hBN QPs. Using columns 3 and (3+4), ie plotting the GW energies with respect to the LDA energies we can deduce the band gap renormalization and the stretching of the conduction/valence bands:

$ gnuplot
gnuplot> p "o-all_Bz.qp" u 3:($3+$4) w p t "Eqp vs Elda"

[[File:EqpvE0.png|center|350px|caption]]

Essentially we can see that the effect of the GW self-energy is the opening of the gap and a linear stretching of the conduction/valence bands that can be estimated by performing a linear fit of the positive and negative energies (the zero is set at top of the valence band).

In order to calculate the band structure, however, we need to interpolate the values we have calculated above on a given path. In Yambo the interpolation is done by the executable <code>ypp</code> (Yambo Post Processing).
By typing:
$ ypp -h
you will recognize that in order to interpolate the bands we need to build a ypp input file using
$ ypp -s b

we can generate the input for the band interpolation:

$ypp -s b -F ypp_bands.in

and edit the ''ypp_bands.in'' file:

electrons # [R] Electrons (and holes)
bnds # [R] Bands
INTERP_mode= "BOLTZ" # Interpolation mode (NN=nearest point, BOLTZ=boltztrap aproach)
OutputAlat= 4.716000 # [a.u.] Lattice constant used for "alat" ouput format
cooIn= "rlu" # Points coordinates (in) cc/rlu/iku/alat
cooOut= "rlu"
% BANDS_bands
1 | 100 | # Number of bands
%
INTERP_Shell_Fac= 20.00000 # The bigger it is a higher number of shells is used
CIRCUIT_E_DB_path= "none" # SAVE obtained from the QE `bands` run (alternative to %BANDS_kpts)
BANDS_path= "" # BANDS path points labels (G,M,K,L...)
BANDS_steps= 10
#BANDS_built_in # Print the bands of the generating points of the circuit using the nearest internal point
%BANDS_kpts
%

We modify the following lines:
BANDS_steps=30
% BANDS_bands
6 | 11 | # Number of bands
%
%BANDS_kpts # K points of the bands circuit
0.33300 |-.66667 |0.00000 |
0.00000 |0.00000 |0.00000 |
0.50000 |-.50000 |0.00000 |
0.33300 |-.66667 |0.00000 |
0.33300 |-.66667 |0.50000 |
0.00000 |0.00000 |0.50000 |
0.50000 |-.50000 |0.50000 |
%

which means we assign 30 points in each segment, we ask to interpolate 3 occupied and 3 empty bands and we assign the following path passing from the high symmetry points: K Γ M K H A L.
Launching:
$ ypp -F ypp_bands.in
will produce the output file ''o.bands_interpolated'' containing:

#
# |k| b6 b7 b8 b9 b10 b11 kx ky kz
#
#
0.00000 -7.22092 -0.13402 -0.13395 4.67691 4.67694 10.08905 0.33300 -0.66667 0.00000
0.03725 -7.18857 -0.17190 -0.12684 4.66126 4.71050 10.12529 0.32190 -0.64445 0.00000
...

and we can plot the bands using gnuplot:
$ gnuplot
gnuplot> p "o.bands_interpolated" u 0:2 w l, "o.bands_interpolated" u 0:3 w l, ...

[[File:bands_lda.png|center|350px|caption]]

and you can recognize the index of the high symmetry point by inspecting the last three columns.
Note that up to now we have interpolated the LDA band structure. In order to plot the GW band structure, we need to tell <code>ypp</code> in the input file where the ''ndb.QP'' database is found. This is achieved by adding in the ''ypp_bands.in'' file the line:

GfnQPdb= "E < ./all_Bz/ndb.QP"

and relaunch

$ ypp -F ypp_bands.in

Now the file ''o.bands_interpolated_01'' contains the GW interpolated band structure. We can plot the LDA and GW band structure together by using the gnuplot script [[gnuplot_scripts|bands.gnu]].

<gallery mode=nolines widths=500px heights=500px perrow=2 caption="Band strcuture of bulk hBN">
File:hBN_bands.png| LDA and GW bands structure
File:hBN_bands_lit.png| LDA and GW bands structure from Ref. <ref name="Arnaud" />
</gallery>

*As expected the effect of the GW correction is to open the gap.
*Comparing the obtained band structure with the one found in the literature by Arnaud and coworkers <ref name="Arnaud"> B. Arnaud, S. Lebegue,P. Rabiller, and M. Alouani Phys, Rev. Lett. 96, 026402 (2006)</ref> we found a very nice qualitative agreement.
*Quantitatively we found a smaller gap: about 5.2 eV (indirect gap), 5.7 eV (direct gap) while in Ref.<ref name="Arnaud" /> is found 5.95 eV for the indirect gap and a minimum direct band gap of 6.47 eV. Other values are also reported in the literature depending on the used pseudopotentials, starting functional and type of self-consistency (see below).
*The present tutorial has been done with a small k point grid which is an important parameter to be checked, so convergence with respect the k point sampling has to be validated.

==Step 4: Summary of the convergence parameters==
We have calculated the band structure of hBN starting from a DFT calculation, here we summarize the main variable we have checked to achieve convergence:

* <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> # [XX] Exchange RL components
Number of G-vectors in the exchange. This number should be checked carefully. Generally a large number is needed as the QP energies show a slow convergence. The calcualtion of the exchange part is rather fast.

*<code>[[Variables#BndsRnXp|BndsRnXp]]</code> #[Xp] Polarization function bands
Number of bands in the independent response function form which the dielectric matrix is calculated. Also this paramater has to be checked carefully,together with NGsBlkXp as the two variables are interconnected

*<code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> # [Xp] Response block size
Number of G-vectors block in the dielectric constant. Also this paramater has to be checked carefully, to be checked together with BndsRnXp. A large number of bands and block can make the calculation very demanding.

*<code>[[Variables#LongDrXp|LongDrXp]] </code> # [Xp] [cc] Electric Field
Direction of the electric field for the calculation of the q=0 component of the dielectric constant e(q,w). In a bulk can be set to (1,1,1), attention must be paid for non 3D systems.

*<code>[[Variables#PPAPntXp|PPAPntXp]] </code> # [Xp] Plasmon pole imaginary energy: this is the second frequency used to fit the Godby-Needs plasmon-pole model (PPM). If results depend consistently by changing this frequency, the PPM is not adequate for your calculation and it is need to gp beyond that, e.g. Real-axis.

*<code>[[Variables#GbndRnge|GbndRnge]] </code> # [GW] G[W] bands range
Number of bands used to expand the Green's function. This number is usually larger than the number of bands used to calculated the dielectricconstant. Single quasiparticle energies converge slowly with respect GbndRnge, energy difference behave better. You can use terminator technique to mitigate the slow dependence.

*<code>[[Variables#GDamping|GDamping]] </code> # [GW] G[W] damping
Small damping in the Green's function definition, the delta
parameter. The final result shouuld not depend on that, usually set at 0.1 eV

*<code>[[Variables#dScStep|dScStep]] </code> # [GW]
Energy step to evaluate Z factors

*<code>[[Variables#DysSolver|DysSolver]] </code> # [GW] Dyson Equation solver ("n","s","g")
Parameters related to the solution of the Dyson equation, "n" Newton linearization, 's' non linear secant method

*<code>[[Variables#GTermKind|GTermKind]] </code> [GW] GW terminator
Terminator for the self-energy<ref name="BG" /> . We have seen how this spped up the convergence with respect empty bands.

*<code>[[Variables#%QPkrange |QPkrange ]] </code> # [GW] QP generalized Kpoint/Band indices
K-points and band range where you want to calculate the GW correction. The syntax is
first kpoint | last kpoint | first band | last band

==References==

GW on h-BN (standalone)

2025-05-18T13:31:18Z

Giacomo.sesti: /* Accelerating the sum over states convergence in the correlation self-energy */

This is a modified version of the full tutorial on GW computations present on the Yambo wiki. Later, you can have a look at the extended version at [[How to obtain the quasi-particle band structure of a bulk material: h-BN]]

In this tutorial you will learn how to:
* Calculate quasi-particle corrections in the Hartree-Fock approximation
* Calculate quasi-particle corrections in the GW approximation
* Choose the input parameters for a meaningful converged calculation
* Plot a band structure including quasi-particle corrections
We will use bulk hBN as an example system. Before starting, you need to obtain the appropriate tarball. See instructions on the [[Tutorials|main tutorials page]]. 
We strongly recommend that you first complete the [[First steps: a walk through from DFT to optical properties]] tutorial.



The aim of the present tutorial is to obtain quasiparticle correction to energy levels using many-body perturbation theory (MBPT). 
The general non-linear quasiparticle equation reads:
[[File:Eqp_1.png|none|x26px|caption]]
As a first step we want to evaluate the self energy Σ entering in the quasiparticle equation. In the GW approach the self-energy can be separated into two components: a static term called the exchange self-energy (Σx), and a dynamical term (energy dependent) called the correlation self-energy (Σc):
[[File:Sigma.png|none|x25px|caption]]
We will treat these two terms separately and demonstrate how to set the most important variables for calculating each term.
In practice we will compute the quasi-particle corrections to the one particle Kohn-Sham eigenvalues obtained through a DFT calculation.

The steps are the following:

==Step 1: The Exchange Self Energy or HF quasi-particle correction==

We start by evaluating the exchange Self-Energy and the corresponding Quasiparticle energies (Hartree-Fock energies).
Follow the module on '''[[Hartree Fock]]''' and then return to this tutorial ''How to obtain the quasiparticle band structure of a bulk material: h-BN''.

==Step 2: The Correlation Self-Energy and Quasiparticle Energies==
Once we have calculated the exchange part, we next turn our attention to the more demanding dynamical part. The correlation part of the self-energy in a plane wave representation reads:
[[File:Sigma_c.png|none|x50px|caption]]
In the expression for the correlation self energy, we have (1) a summation over bands, (2) an integral over the Brillouin Zone, and (3) a sum over the G vectors. In contrast with the case of Σx, the summation over bands extends over ''all'' bands (including the unoccupied ones), and so convergence tests are needed. Another important difference is that the Coulomb interaction is now screened so a fundamental ingredient is the evaluation of the dynamical dielectric matrix. The expression for the dielectric matrix, calculated at the RPA level and including local field effects, has been already treated in the [[Local fields|Local fields]] tutorial.

In the following, we will see two ways to take into account the dynamical effects. First, we will see how to set the proper parameters to obtain a model dielectric function based on a widely used approximation, which models the energy dependence of each component of the dielectric matrix with a single pole function.
Secondly, we will see how to perform calculations by evaluating the dielectric matrix on a regular grid of frequencies.

Once the correlation part of the self-energy is calculated, we will check the convergence of the different parameters with respect to some final quantity, such as the gap.

After computing the frequency dependent self-energy, we will discover that in order to solve the quasiparticle equation we will need to know its value ''at the value of the quasiparticle itself''. In the following, unless explicitly stated, we will solve the non-linear quasi-particle equation at first order, by expanding the self-energy around the Kohn-Sham eigenvalue. In this way the quasiparticle equation reads:

[[File:Eqp_2.png|none|x26px|caption]]

where the normalization factor Z is defined as:

[[File:z_fac.png|none|x40px|caption]]

===The Plasmon Pole approximation===
As stated above, the basic idea of the plasmon-pole approximation is to approximate the frequency dependence of the dielectric matrix with a single pole function of the form:
[[File:ppa.png|none|x26px|caption]]
The two parameters RGG' and ΩGG' are obtained by a fit (for each component), after having calculated the RPA dielectric matrix at two given frequencies.
Yambo calculates the dielectric matrix in the static limit ( ω=0) and at a user defined frequency called the plasmon-pole frequency (ω=iωp).
Such an approximation has the big computational advantage of calculating the dielectric matrix for only two frequencies and leads to an analytical expression for the frequency integral of the correlation self-energy.
==== Input file generation ====
Let's start by building up the input file for a GW/PPA calculation, including the calculation of the exchange self-energy. From <code>yambo -H</code> you should understand that the correct option is <code>yambo -x -p p -g n</code>:

$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo -x -p p -g n -F gw_ppa.in

Let's modify the input file in the following way:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] GoWo Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
[[Variables#EXXRLvcs|EXXRLvcs]] = 40 Ry # [XX] Exchange RL components
[[Variables#VXCRLvcs|VXCRLvcs]] = 3187 RL # [XC] XCpotential RL components
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% BndsRnXp
1 | 10 | # [Xp] Polarization function bands
%
[[Variables#NGsBlkXp|NGsBlkXp]]= 1000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
[[Variables#PPAPntXp|PPAPntXp]] = 27.21138 eV # [Xp] PPA imaginary energy
%[[Variables#GbndRnge|GbndRnge]]
1 | 40 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
7| 7| 8| 9|
%

Brief explanation of some settings:
* Similar to the Hartree Fock study, we will concentrate on the convergence of the '''direct''' gap of the system. Hence we select the last occupied (8) and first unoccupied (9) bands for k-point number 7 in the <code>[[Variables#QPkrange|QPkrange]]</code> variable.
* We also keep <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> at its converged value of 40 Ry as obtained in the '''[[Hartree Fock]]''' tutorial.
* For the moment we keep fixed the plasmon pole energy <code>[[Variables#PPAPntXp|PPAPntXp]]</code> at its default value (=1 Hartree).
* We keep fixed the direction of the electric field for the evaluation of the dielectric matrix to a non-specific value: <code>[[Variables#LongDrXp|LongDrXp]]</code>=(1,1,1).
* Later we will study convergence with respect to <code>[[Variables#GbndRnge|GbndRnge]]</code>, <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>, and <code>[[Variables#BndsRnXp|BndsRnXp]]</code>; for now just set them to the values indicated.

==== Understanding the output ====
Let's look at the typical Yambo output. Run Yambo with an appropriate <code>-J</code> flag:

$ yambo -F gw_ppa.in -J 10b_1Ry

In the standard output you can recognise the different steps of the calculations: calculation of the screening matrix (evaluation of the non interacting and interacting response), calculation of the exchange self-energy, and finally the calculation of the correlation self-energy and quasiparticle energies. Moreover information on memory usage and execution time are reported:
...
<---> [05] Dynamic Dielectric Matrix (PPA)
...
<08s> Xo@q[3] |########################################| [100%] 03s(E) 03s(X)
<08s> X@q[3] |########################################| [100%] --(E) --(X)
...
<43s> [06] Bare local and non-local Exchange-Correlation
<43s> EXS |########################################| [100%] --(E) --(X)
...
<43s> [07] Dyson equation: Newton solver
<43s> [07.01] G0W0 (W PPA)
...
<45s> G0W0 (W PPA) |########################################| [100%] --(E) --(X)
...
<45s> [07.02] QP properties and I/O
<45s> [08] Game Over & Game summary

Let's have a look at the report and output. The output file ''o-10b_1Ry.qp'' contains (for each band and k-point that we indicated in the input file) the values of the bare KS eigenvalue, its GW correction and the correlation part of the self energy:
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.411876 -0.567723 2.322443
7 9 3.877976 2.413773 -2.232241

In the header you can see the details of the calculations, for instance it reports that <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=1 Ry corresponds to 5 Gvectors:

# X G`s [used]: 5

Other information can be found in the report file ''r-10b_1Ry_em1d_ppa_HF_and_locXC_gw0'', such as the renormalization factor defined above, the value of the ''exchange'' self-energy (non-local XC) and of the DFT exchange-correlation potential (local XC):

[07.02] QP properties and I/O
=============================
Legend (energies in eV):
- B : Band - Eo : bare energy
- E : QP energy - Z : Renormalization factor
- So : Sc(Eo) - S : Sc(E)
- dSp: Sc derivative precision
- lXC: Starting Local XC (DFT)
-nlXC: Starting non-Local XC (HF)
QP [eV] @ K [7] (iku): 0.000000 -0.500000 0.000000
B=8 Eo= -0.41 E= -0.98 E-Eo= -0.56 Re(Z)=0.81 Im(Z)=-.2368E-2 nlXC=-19.13 lXC=-16.11 So= 2.322
B=9 Eo= 3.88 E= 6.29 E-Eo= 2.41 Re(Z)=0.83 Im(Z)=-.2016E-2 nlXC=-5.536 lXC=-10.67 So=-2.232

Extended information can be also found in the output activating in the input the <code>[[Variables#ExtendOut|ExtendOut]]</code> flag. This is an optional flag that can be activated by adding the <code>-V qp</code> verbosity option when building the input file. The plasmon pole screening, exchange self-energy and the quasiparticle energies are also saved in databases so that they can be reused in further runs:

$ ls ./10b_1Ry
ndb.pp ndb.pp_fragment_1 ... ndb.HF_and_locXC ndb.QP

===Convergence tests for a quasi particle calculation===

Now we can check the convergence of the different variables entering in the expression of the correlation part of the self energy. 
First, we focus on the parameter governing the ''screening matrix'' you have already seen in the [[RPA/IP]] section. In contrast to the calculation of the [[RPA/IP]] dielectric function, where you considered either the optical limit or a finite q response (EELS), here the dielectric matrix will be calculated for ''all'' q-points determined by the choice of k-point sampling.

The parameters that need to be converged can be understood by looking at expression of the dielectric matrix:
[[File:Yambo-CH5.png|none|x30px|Yambo tutorial image]]
where ''χGG''' is given by
[[File:Dyson_rpa.png|none|x40px|Yambo tutorial image]]
and χ0GG' is given by
[[File:ChiO.png|none|x45px|Yambo tutorial image]]
* <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> : The dimension of the microscopic inverse matrix, related to [[Local fields]]
* <code>[[Variables#BndsRnXp|BndsRnXp]]</code> : The sum on bands (c,v) in the independent particle χ0GG'

====Converging Screening Parameters====
Here we will check the convergence of the gap starting from the variables controlling the screening reported above: the bands employed to build the RPA response function <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and the number of blocks (G,G') of the dielectric matrix ε-1G,G' <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
In the next section we will study convergence with respect to the sum over states summation (sum over ''m'' in the Σc expression); here let's fix <code>[[Variables#GbndRnge|GbndRnge]]</code> to a reasonable value (40 Ry).

Let's build a series of input files differing by the values of bands and block sizes in χ0GG' considering <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in the range 10-50 (upper limit) and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> in the range 1000 to 5000 mRy. To do this by hand, file by file, open the ''gw_ppa.in'' file in an editor and change to:
[[Variables#NGsBlkXp|NGsBlkXp]] = '''2000 mRy'''
while leaving the rest untouched. Repeat for 3000 mRy, 4000 mRy etc. Next, for each <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> change to:
% [[Variables#BndsRnXp|BndsRnXp]]
1 | 20 | # [Xp] Polarization function bands
%
and repeat for 30, 40 and so on. Give a '''different name''' to each file: ''gw_ppa_Xb_YRy.in'' with X=10,20,30,40 and Y=1000,2000,3000,4000,5000 mRy.

This is obviously quite tedious. However, you can automate both the input construction and code execution using bash or python scripts (indeed later you will learn how to use the the yambo-python [http://www.yambo-code.org/wiki/index.php?title=GW_tutorial._Convergence_and_approximations_(BN)]tool for this task). For now, you can use the simple [[bash_scripts|generate_inputs_1.sh]] bash script to generate the required input files (after copying the script you need to type <code>$ chmod +x name_of_the_script.sh </code>).

Finally launch the calculations:

$ yambo -F gw_ppa_10b_1Ry.in -J 10b_1Ry
$ yambo -F gw_ppa_10b_2Ry.in -J 10b_2Ry
...
$ yambo -F gw_ppa_40b_5Ry.in -J 40b_5Ry

Once the jobs are terminated we can collect the quasiparticle energies for fixed <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in different files named e.g. ''10b.dat, 20b.dat'' etc. for plotting, by putting in separate columns: the energy cutoff; the size of the G blocks; the quasiparticle energy of the valence band; and that of the conduction band.
To do this e.g. for the ''10b.dat'' file you can type:
$ cat o-10b* | grep "7 * 8" | awk '{print $3+$4}'
to parse the valence band quasiparticle energies and
$ cat o-10b* | grep "7 * 9" | awk '{print $3+$4}'
for the conduction band; and put all the data in the ''10b.dat'' files. As there are many files to process you can use the [[bash_scripts|parse_qps.sh]] script to create the ''10b.dat'' file and edit the script changing the number of <code>[[Variables#BndsRnXp|BndsRnXp]]</code> for the other output files.

Once we have collected all the quasiparticle values we can plot the gap, or the valence and conduction band energies separately, as a function of the block size or energy cutoff:
$ gnuplot
gnuplot> p "10b.dat" u 1:3 w lp t " Valence BndsRnXp=10", "20b.dat" u 1:3 w lp t "Valence BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:4 w lp t " Conduction BndsRnXp=10", "20b.dat" u 1:4 w lp t "Conduction BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:($4-$3) w lp t " Gap BndsRnXp=10", "20b.dat" u 1:($4-$3) w lp t "gap BndsRnXp=20",..
or both using e.g. the [[gnuplot_scripts|ppa_gap.gnu]] gnuplot script:

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect screening parameters">
File:ppa2.png|Valence band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
File:ppa3.png|Conduction band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
</gallery>

Looking at the plot we can see that:
* The two parameters are not totally independent (see e.g. the valence quasiparticle convergence) and this is the reason why we converged them simultaneously
* The gap (energy difference) converge faster than single quasiparticle state
* The convergence criteria depends on the degree of accuracy we look for, but considering the approximations behind the calculations (plasmon-pole etc.), it is not always a good idea to enforce too strict a criteria.
* Even if not totally converged we can consider that the upper limit of <code>[[Variables#BndsRnXp|BndsRnXp]]</code>=30, and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=3Ry are reasonable parameters.

Finally notice that in the last versions of Yambo (from 4.5 etc.) there is a terminator for the dielectric constant that accelerates convergence on the number of bands,
you can active it with the <code>-V resp</code> verbosity and setting <code>XTermKind= "BG"</code>. You can compare the converge of the dielectric constant with and without terminator,
similar to what happens for the self-energy, see next section.

====Converging the sum over states in the correlation self-energy====
From now on we will keep fixed these parameters and will perform a convergence study on the sum over state summation in the correlation self-energy (Σc) <code>[[Variables#GbndRnge|GbndRnge]]</code>.

In order to use the screening previously calculated we can copy the plasmon pole parameters saved in the <code>30b_3Ry</code> directory in the <code>SAVE</code> directory. In this way the screening will be read by Yambo and not calculated again:

$ cp ./30b_3Ry/ndb.pp* ./SAVE/.

(Note: you may have to delete these files before running the BSE tutorials)

In order to use the databases we have to be sure to have the same plasmon-pole parameters in our input files.
Edit ''gw_ppa_30b_3Ry.in'' and modify <code>[[Variables#GbndRnge|GbndRnge]]</code> to order to have a number of bands in the range from 10 to 80 inside different files named ''gw_ppa_Gbnd10.in'', ''gw_ppa_Gbnd20.in'' etc. You can also run the the [[bash_scripts|generate_inputs_2.sh]] bash script to generate the required input files.

Next, launch yambo for each input:

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20
...

and as done before we can inspect the obtained quasiparticle energies:

$ grep "7 * 8" o-Gbnd* | awk '{print $4+$5}'
for the valence bands, and
$ grep "7 * 9" o-Gbnd* | awk '{print $4+$5}'
for the conduction band.

Collect the results in a text file ''Gbnd_conv.dat'' containing: the number of Gbnd, the valence energy, and the conduction energy.
Now, as done before we can plot the valence and conduction quasiparticle levels separately as well as the gap, as a function of the number of bands used in the summation:

$ gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp lt 7 t "Valence"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp lt 7 t "Conduction"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp lt 7 t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect sum over states in correlation self-energy">
File:Gbnd_val.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>.
File:Gbnd_cond.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
File:Gbnd_gap.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
</gallery>

Inspecting the plot we can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is rather slow and many bands are needed to get converged results.
* As observed above the gap (energy difference) converges faster than the single quasiparticle state energies.

====Accelerating the sum over states convergence in the correlation self-energy====
In general the convergence with respect the sum over states can be very cumbersome. Here we show how it can be mitigated by using
a technique developed by F. Bruneval and X. Gonze <ref name="BG"> F. Bruneval and X. Gonze, Physical Review B 78, 085125 (2008 )</ref>. The basic idea relies in replacing the eigenenergies of the states that are not treated explicitly by a common energy, and take into account all the states, which are not explicitly included in the calculation through the closure relation.
To apply this technique in Yambo we need to activate the optional terminator variable [[Variables#GTermKind|GTermKind]]. You can edit the input file by setting:

[[Variables#GTermKind|GTermKind]]= "BG"

for example in the file ''gw_ppa_Gbnd10.in''. Now, repeat this operation also in all the other ''gw_ppa_GbndX.in'' input files.
We can now perform once again the same calculations with the terminator activated

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10_term
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20_term
...

and collect the new results:

$ grep "7 * 8" *term.qp | awk '{print $4+$5}'
$ grep "7 * 9" *term.qp | awk '{print $4+$5}'

in a new file called ''Gbnd_conv_terminator.dat''. Now we can plot the same quantities as before by looking at the effect of having introduced the terminator.

gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp t "Valence", "Gbnd_conv_terminator.dat" u 1:2 w lp t "Valence with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp t "Conduction", "Gbnd_conv_terminator.dat" u 1:3 w lp t "Conduction with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp t "Gap", "Gbnd_conv_terminator.dat" u 1:($3-$2) w lp t "Gap with terminator"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Effect of the terminator in convergence of QP energies with respect sum over states">
File:val_t.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique <ref name="BG" />
File:cond_t.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
File:gap_t.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
</gallery>

We can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is accelerated in particular for the single quasiparticle states.
* From the plot above we can see that <code>[[Variables#GbndRange|GbndRange]]</code>=40 is sufficient to have a converged gap

===Beyond the plasmon pole approximation: a full frequency approach - real axis integration===
All the calculations performed up to now were based on the plasmon pole approximation (PPA). Now we remove this approximation by evaluating numerically the frequency integral in the expression of the correlation self-energy (Σc). To this aim we need to evaluate the screening for a number of frequencies in the interval depending on the electron-hole energy difference (energy difference between empty and occupied states) entering in the sum over states.
Let's build the input file for a full frequency calculation by simply typing:

$ yambo -F gw_ff.in -g n -p r

and we can set the variable studied up to now at their converged value:

%[[Variables#BndsRnXd|BndsRnXd]]
1 | 30 |
%
[[Variables#NGsBlkXd|NGsBlkXd]]=3000 mRy
%[[Variables#GbndRange|GbndRange]]
1 | 40 |
%
[[Variables#EXXRLvcs|EXXRLvcs]]=40000 mRy
% [[Variables#LongDrXd|LongDrXd]]
1.000000 | 1.000000 | 1.000000 | # [Xd] [cc] Electric Field
%
%QPkrange # # [GW] QP generalized Kpoint/Band indices
7|7|8|9|
%

and next vary in different files (name them ''gw_ff10.in'' etc.) the number of frequencies we evaluate for the screened coulomb potential. This is done by setting the values of [[Variables#ETStpsXd|ETStpsXd]]=10 , 50 , 100, 150, 200, 250.
You can also run the the [[bash_scripts|generate_inputs_3.sh]] bash script to generate the required input files.

Next launch yambo:
$ yambo -F gw_ff10.in -J ff10
$ yambo -F gw_ff50.in -J ff50
...

Clearly as we are evaluating the screening for a large number of frequencies these calculations will be heavier than the case above where the calculation of the screening was done for only two frequencies (zero and plasmon-pole).
As before, collect the valence and conduction bands as a function of the number of frequencies in a file called ''gw_ff.dat'' and plot the behaviour of the conduction and valence bands and the gap.

$ gnuplot
gnuplot> p "gw_ff.dat" u 1:2 w lp t "Valence"
gnuplot> p "gw_ff.dat" u 1:3 w lp t "Conduction"
gnuplot> p "gw_ff.dat" u 1:($3-$2) w lp t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Real axis integration, convergences with respect the used number of frequencies">
File:ff_v.png|Valence band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_c.png|Conduction band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_g.png|Gap energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
</gallery>

We can see that:

* Oscillations are still present, indicating that even more frequencies have to be considered. In general, a real-axis calculation is very demanding.
* The final result of the gap obtained up to now does not differ too much from the one obtained at the plasmon-pole level (~50 meV)

As the real-axis calculation is computational demanding, in the last years it has been developed in Yambo an alternative method called Multipole approximation (MPA). The MPA can be seen as an extension of the PPA, where you take a finite number of poles instead of just one. It has been seen that even a small number of poles grants a MPA on par with a full real axis calculation. For your reference, at [[Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)|https://www.yambo-code.eu/wiki/index.php/Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)]] there is a tutorial on the MPA method. For now, however, let us pass over to the next section.

===Secant Solver===

The real axis integration permits also to go beyond the linear expansion to solve the quasi particle equation.
The QP equation is a non-linear equation whose solution must be found using a suitable numerical algorithm.
[[File:Eqp_sec.png|none|x22px|caption]]
The mostly used, based on the linearization of the self-energy operator is the Newton method that is the one we have used up to now.
Yambo can also perform a search of the QP energies using a non-linear iterative method based on the [https://en.wikipedia.org/wiki/Secant_method Secant iterative Method].

In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite difference approximation of Newton's method.
The equation that defines the secant method is:

[[File:secant_eq.png|none|x35px|caption]]

The first two iterations of the secant method are shown in the following picture. The red curve shows the function f and the blue lines are the secants.

[[File:Secant_method.png|center|250px|caption]]

To see if there is any non-linear effect in the solution of the Dyson equation we compare the result of the calculation using the Newton solver as done before with the present case.
In order to use the secant method you need to edit one of the the previous ''gw_ffN.in'' files e.g. ''gw_ff100.in'' and substitute:

DysSolver= "g"
to
DysSolver= "s"

Repeat the calculations in the same way as before:

$ yambo -F gw_ff100.in -J ff100

Note than now the screening will ''not'' be calculated again as it has been stored in the ''ffN'' directories as can be seen in the report file:

[05] Dynamical Dielectric Matrix
================================
[RD./ff10//ndb.em1d]----
...
[RD./ff10//ndb.em1d_fragment_1]--------------
...

Comparing the output files, e.g. for the case with 100 freqs:

''o-ff100.qp'' '''Newton Solver:'''
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08708 2.91254
7 9 3.877976 1.421968 -3.417357

''o-ff100.qp_01'' '''Secant Solver:'''
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08715 2.93518
7 9 3.877976 1.401408 -3.731649

From the comparison, we see that the effect is of the order of 20 meV, of the same order of magnitude of the accuracy of the GW calculations.

==Step 3: Interpolating Band Structures==
Up to now we have checked convergence for the gap. Now we want to calculate the quasiparticle corrections across the Brillouin zone in order to visualize the entire band structure along a path connecting high symmetry points.

To do that we start by calculating the QP correction in the plasmon-pole approximation for all the k points of our sampling and for a number of bands around the gap. You can use a previous input file or generate a new one: <code> yambo -F gw_ppa_all_Bz.in -x -p p -g n </code> and set the parameters found in the previous tests:

EXXRLvcs= 40 Ry
% BndsRnXp
1 | 30 | # [Xp] Polarization function bands
%
NGsBlkXp= 3000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 40 | # [GW] G[W] bands range
%

and we calculate it for all the available kpoints by setting:
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 14| 6|11|
%

Note that as we have already calculated the screening with these parameters you can save time and use that database either by running in the previous directory by using <code> -J 30b_3Ry </code> or if you prefer to put the new databases in the new all_Bz directory you can create a new directory and copy there the screening databases:

$ mkdir all_Bz
$ cp ./30b_3Ry/ndb.pp* ./all_Bz/

and launch the calculation:

$ yambo -F gw_ppa_all_Bz.in -J all_Bz

Now we can inspect the output and see that it contains the correction for all the k points for the bands we asked:

# K-point Band Eo E-Eo Sc|Eo
#
1.000000 6.000000 -1.299712 -0.219100 3.788044
1.000000 7.000000 -1.296430 -0.241496 3.788092
1.000000 8.000000 -1.296420 -0.243115 3.785947
1.000000 9.000000 4.832399 0.952386 -3.679259
1.00000 10.00000 10.76416 2.09915 -4.38743
1.00000 11.00000 11.36167 2.48053 -3.91021
....
By plotting some of the 'o-all_Bz.qp" columns it is possible to discuss some physical properties of the hBN QPs. Using columns 3 and (3+4), ie plotting the GW energies with respect to the LDA energies we can deduce the band gap renormalization and the stretching of the conduction/valence bands:

$ gnuplot
gnuplot> p "o-all_Bz.qp" u 3:($3+$4) w p t "Eqp vs Elda"

[[File:EqpvE0.png|center|350px|caption]]

Essentially we can see that the effect of the GW self-energy is the opening of the gap and a linear stretching of the conduction/valence bands that can be estimated by performing a linear fit of the positive and negative energies (the zero is set at top of the valence band).

In order to calculate the band structure, however, we need to interpolate the values we have calculated above on a given path. In Yambo the interpolation is done by the executable <code>ypp</code> (Yambo Post Processing).
By typing:
$ ypp -h
you will recognize that in order to interpolate the bands we need to build a ypp input file using
$ ypp -s b

we can generate the input for the band interpolation:

$ypp -s b -F ypp_bands.in

and edit the ''ypp_bands.in'' file:

electrons # [R] Electrons (and holes)
bnds # [R] Bands
INTERP_mode= "BOLTZ" # Interpolation mode (NN=nearest point, BOLTZ=boltztrap aproach)
OutputAlat= 4.716000 # [a.u.] Lattice constant used for "alat" ouput format
cooIn= "rlu" # Points coordinates (in) cc/rlu/iku/alat
cooOut= "rlu"
% BANDS_bands
1 | 100 | # Number of bands
%
INTERP_Shell_Fac= 20.00000 # The bigger it is a higher number of shells is used
CIRCUIT_E_DB_path= "none" # SAVE obtained from the QE `bands` run (alternative to %BANDS_kpts)
BANDS_path= "" # BANDS path points labels (G,M,K,L...)
BANDS_steps= 10
#BANDS_built_in # Print the bands of the generating points of the circuit using the nearest internal point
%BANDS_kpts
%

We modify the following lines:
BANDS_steps=30
% BANDS_bands
6 | 11 | # Number of bands
%
%BANDS_kpts # K points of the bands circuit
0.33300 |-.66667 |0.00000 |
0.00000 |0.00000 |0.00000 |
0.50000 |-.50000 |0.00000 |
0.33300 |-.66667 |0.00000 |
0.33300 |-.66667 |0.50000 |
0.00000 |0.00000 |0.50000 |
0.50000 |-.50000 |0.50000 |
%

which means we assign 30 points in each segment, we ask to interpolate 3 occupied and 3 empty bands and we assign the following path passing from the high symmetry points: K Γ M K H A L.
Launching:
$ ypp -F ypp_bands.in
will produce the output file ''o.bands_interpolated'' containing:

#
# |k| b6 b7 b8 b9 b10 b11 kx ky kz
#
#
0.00000 -7.22092 -0.13402 -0.13395 4.67691 4.67694 10.08905 0.33300 -0.66667 0.00000
0.03725 -7.18857 -0.17190 -0.12684 4.66126 4.71050 10.12529 0.32190 -0.64445 0.00000
...

and we can plot the bands using gnuplot:
$ gnuplot
gnuplot> p "o.bands_interpolated" u 0:2 w l, "o.bands_interpolated" u 0:3 w l, ...

[[File:bands_lda.png|center|350px|caption]]

and you can recognize the index of the high symmetry point by inspecting the last three columns.
Note that up to now we have interpolated the LDA band structure. In order to plot the GW band structure, we need to tell <code>ypp</code> in the input file where the ''ndb.QP'' database is found. This is achieved by adding in the ''ypp_bands.in'' file the line:

GfnQPdb= "E < ./all_Bz/ndb.QP"

and relaunch

$ ypp -F ypp_bands.in

Now the file ''o.bands_interpolated_01'' contains the GW interpolated band structure. We can plot the LDA and GW band structure together by using the gnuplot script [[gnuplot_scripts|bands.gnu]].

<gallery mode=nolines widths=500px heights=500px perrow=2 caption="Band strcuture of bulk hBN">
File:hBN_bands.png| LDA and GW bands structure
File:hBN_bands_lit.png| LDA and GW bands structure from Ref. <ref name="Arnaud" />
</gallery>

*As expected the effect of the GW correction is to open the gap.
*Comparing the obtained band structure with the one found in the literature by Arnaud and coworkers <ref name="Arnaud"> B. Arnaud, S. Lebegue,P. Rabiller, and M. Alouani Phys, Rev. Lett. 96, 026402 (2006)</ref> we found a very nice qualitative agreement.
*Quantitatively we found a smaller gap: about 5.2 eV (indirect gap), 5.7 eV (direct gap) while in Ref.<ref name="Arnaud" /> is found 5.95 eV for the indirect gap and a minimum direct band gap of 6.47 eV. Other values are also reported in the literature depending on the used pseudopotentials, starting functional and type of self-consistency (see below).
*The present tutorial has been done with a small k point grid which is an important parameter to be checked, so convergence with respect the k point sampling has to be validated.

==Step 4: Summary of the convergence parameters==
We have calculated the band structure of hBN starting from a DFT calculation, here we summarize the main variable we have checked to achieve convergence:

* <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> # [XX] Exchange RL components
Number of G-vectors in the exchange. This number should be checked carefully. Generally a large number is needed as the QP energies show a slow convergence. The calcualtion of the exchange part is rather fast.

*<code>[[Variables#BndsRnXp|BndsRnXp]]</code> #[Xp] Polarization function bands
Number of bands in the independent response function form which the dielectric matrix is calculated. Also this paramater has to be checked carefully,together with NGsBlkXp as the two variables are interconnected

*<code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> # [Xp] Response block size
Number of G-vectors block in the dielectric constant. Also this paramater has to be checked carefully, to be checked together with BndsRnXp. A large number of bands and block can make the calculation very demanding.

*<code>[[Variables#LongDrXp|LongDrXp]] </code> # [Xp] [cc] Electric Field
Direction of the electric field for the calculation of the q=0 component of the dielectric constant e(q,w). In a bulk can be set to (1,1,1), attention must be paid for non 3D systems.

*<code>[[Variables#PPAPntXp|PPAPntXp]] </code> # [Xp] Plasmon pole imaginary energy: this is the second frequency used to fit the Godby-Needs plasmon-pole model (PPM). If results depend consistently by changing this frequency, the PPM is not adequate for your calculation and it is need to gp beyond that, e.g. Real-axis.

*<code>[[Variables#GbndRnge|GbndRnge]] </code> # [GW] G[W] bands range
Number of bands used to expand the Green's function. This number is usually larger than the number of bands used to calculated the dielectricconstant. Single quasiparticle energies converge slowly with respect GbndRnge, energy difference behave better. You can use terminator technique to mitigate the slow dependence.

*<code>[[Variables#GDamping|GDamping]] </code> # [GW] G[W] damping
Small damping in the Green's function definition, the delta
parameter. The final result shouuld not depend on that, usually set at 0.1 eV

*<code>[[Variables#dScStep|dScStep]] </code> # [GW]
Energy step to evaluate Z factors

*<code>[[Variables#DysSolver|DysSolver]] </code> # [GW] Dyson Equation solver ("n","s","g")
Parameters related to the solution of the Dyson equation, "n" Newton linearization, 's' non linear secant method

*<code>[[Variables#GTermKind|GTermKind]] </code> [GW] GW terminator
Terminator for the self-energy<ref name="BG" /> . We have seen how this spped up the convergence with respect empty bands.

*<code>[[Variables#%QPkrange |QPkrange ]] </code> # [GW] QP generalized Kpoint/Band indices
K-points and band range where you want to calculate the GW correction. The syntax is
first kpoint | last kpoint | first band | last band

==References==

GW on h-BN (standalone)

2025-05-18T13:29:32Z

Giacomo.sesti:

This is a modified version of the full tutorial on GW computations present on the Yambo wiki. Later, you can have a look at the extended version at [[How to obtain the quasi-particle band structure of a bulk material: h-BN]]

In this tutorial you will learn how to:
* Calculate quasi-particle corrections in the Hartree-Fock approximation
* Calculate quasi-particle corrections in the GW approximation
* Choose the input parameters for a meaningful converged calculation
* Plot a band structure including quasi-particle corrections
We will use bulk hBN as an example system. Before starting, you need to obtain the appropriate tarball. See instructions on the [[Tutorials|main tutorials page]]. 
We strongly recommend that you first complete the [[First steps: a walk through from DFT to optical properties]] tutorial.



The aim of the present tutorial is to obtain quasiparticle correction to energy levels using many-body perturbation theory (MBPT). 
The general non-linear quasiparticle equation reads:
[[File:Eqp_1.png|none|x26px|caption]]
As a first step we want to evaluate the self energy Σ entering in the quasiparticle equation. In the GW approach the self-energy can be separated into two components: a static term called the exchange self-energy (Σx), and a dynamical term (energy dependent) called the correlation self-energy (Σc):
[[File:Sigma.png|none|x25px|caption]]
We will treat these two terms separately and demonstrate how to set the most important variables for calculating each term.
In practice we will compute the quasi-particle corrections to the one particle Kohn-Sham eigenvalues obtained through a DFT calculation.

The steps are the following:

==Step 1: The Exchange Self Energy or HF quasi-particle correction==

We start by evaluating the exchange Self-Energy and the corresponding Quasiparticle energies (Hartree-Fock energies).
Follow the module on '''[[Hartree Fock]]''' and then return to this tutorial ''How to obtain the quasiparticle band structure of a bulk material: h-BN''.

==Step 2: The Correlation Self-Energy and Quasiparticle Energies==
Once we have calculated the exchange part, we next turn our attention to the more demanding dynamical part. The correlation part of the self-energy in a plane wave representation reads:
[[File:Sigma_c.png|none|x50px|caption]]
In the expression for the correlation self energy, we have (1) a summation over bands, (2) an integral over the Brillouin Zone, and (3) a sum over the G vectors. In contrast with the case of Σx, the summation over bands extends over ''all'' bands (including the unoccupied ones), and so convergence tests are needed. Another important difference is that the Coulomb interaction is now screened so a fundamental ingredient is the evaluation of the dynamical dielectric matrix. The expression for the dielectric matrix, calculated at the RPA level and including local field effects, has been already treated in the [[Local fields|Local fields]] tutorial.

In the following, we will see two ways to take into account the dynamical effects. First, we will see how to set the proper parameters to obtain a model dielectric function based on a widely used approximation, which models the energy dependence of each component of the dielectric matrix with a single pole function.
Secondly, we will see how to perform calculations by evaluating the dielectric matrix on a regular grid of frequencies.

Once the correlation part of the self-energy is calculated, we will check the convergence of the different parameters with respect to some final quantity, such as the gap.

After computing the frequency dependent self-energy, we will discover that in order to solve the quasiparticle equation we will need to know its value ''at the value of the quasiparticle itself''. In the following, unless explicitly stated, we will solve the non-linear quasi-particle equation at first order, by expanding the self-energy around the Kohn-Sham eigenvalue. In this way the quasiparticle equation reads:

[[File:Eqp_2.png|none|x26px|caption]]

where the normalization factor Z is defined as:

[[File:z_fac.png|none|x40px|caption]]

===The Plasmon Pole approximation===
As stated above, the basic idea of the plasmon-pole approximation is to approximate the frequency dependence of the dielectric matrix with a single pole function of the form:
[[File:ppa.png|none|x26px|caption]]
The two parameters RGG' and ΩGG' are obtained by a fit (for each component), after having calculated the RPA dielectric matrix at two given frequencies.
Yambo calculates the dielectric matrix in the static limit ( ω=0) and at a user defined frequency called the plasmon-pole frequency (ω=iωp).
Such an approximation has the big computational advantage of calculating the dielectric matrix for only two frequencies and leads to an analytical expression for the frequency integral of the correlation self-energy.
==== Input file generation ====
Let's start by building up the input file for a GW/PPA calculation, including the calculation of the exchange self-energy. From <code>yambo -H</code> you should understand that the correct option is <code>yambo -x -p p -g n</code>:

$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo -x -p p -g n -F gw_ppa.in

Let's modify the input file in the following way:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] GoWo Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
[[Variables#EXXRLvcs|EXXRLvcs]] = 40 Ry # [XX] Exchange RL components
[[Variables#VXCRLvcs|VXCRLvcs]] = 3187 RL # [XC] XCpotential RL components
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% BndsRnXp
1 | 10 | # [Xp] Polarization function bands
%
[[Variables#NGsBlkXp|NGsBlkXp]]= 1000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
[[Variables#PPAPntXp|PPAPntXp]] = 27.21138 eV # [Xp] PPA imaginary energy
%[[Variables#GbndRnge|GbndRnge]]
1 | 40 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
7| 7| 8| 9|
%

Brief explanation of some settings:
* Similar to the Hartree Fock study, we will concentrate on the convergence of the '''direct''' gap of the system. Hence we select the last occupied (8) and first unoccupied (9) bands for k-point number 7 in the <code>[[Variables#QPkrange|QPkrange]]</code> variable.
* We also keep <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> at its converged value of 40 Ry as obtained in the '''[[Hartree Fock]]''' tutorial.
* For the moment we keep fixed the plasmon pole energy <code>[[Variables#PPAPntXp|PPAPntXp]]</code> at its default value (=1 Hartree).
* We keep fixed the direction of the electric field for the evaluation of the dielectric matrix to a non-specific value: <code>[[Variables#LongDrXp|LongDrXp]]</code>=(1,1,1).
* Later we will study convergence with respect to <code>[[Variables#GbndRnge|GbndRnge]]</code>, <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>, and <code>[[Variables#BndsRnXp|BndsRnXp]]</code>; for now just set them to the values indicated.

==== Understanding the output ====
Let's look at the typical Yambo output. Run Yambo with an appropriate <code>-J</code> flag:

$ yambo -F gw_ppa.in -J 10b_1Ry

In the standard output you can recognise the different steps of the calculations: calculation of the screening matrix (evaluation of the non interacting and interacting response), calculation of the exchange self-energy, and finally the calculation of the correlation self-energy and quasiparticle energies. Moreover information on memory usage and execution time are reported:
...
<---> [05] Dynamic Dielectric Matrix (PPA)
...
<08s> Xo@q[3] |########################################| [100%] 03s(E) 03s(X)
<08s> X@q[3] |########################################| [100%] --(E) --(X)
...
<43s> [06] Bare local and non-local Exchange-Correlation
<43s> EXS |########################################| [100%] --(E) --(X)
...
<43s> [07] Dyson equation: Newton solver
<43s> [07.01] G0W0 (W PPA)
...
<45s> G0W0 (W PPA) |########################################| [100%] --(E) --(X)
...
<45s> [07.02] QP properties and I/O
<45s> [08] Game Over & Game summary

Let's have a look at the report and output. The output file ''o-10b_1Ry.qp'' contains (for each band and k-point that we indicated in the input file) the values of the bare KS eigenvalue, its GW correction and the correlation part of the self energy:
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.411876 -0.567723 2.322443
7 9 3.877976 2.413773 -2.232241

In the header you can see the details of the calculations, for instance it reports that <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=1 Ry corresponds to 5 Gvectors:

# X G`s [used]: 5

Other information can be found in the report file ''r-10b_1Ry_em1d_ppa_HF_and_locXC_gw0'', such as the renormalization factor defined above, the value of the ''exchange'' self-energy (non-local XC) and of the DFT exchange-correlation potential (local XC):

[07.02] QP properties and I/O
=============================
Legend (energies in eV):
- B : Band - Eo : bare energy
- E : QP energy - Z : Renormalization factor
- So : Sc(Eo) - S : Sc(E)
- dSp: Sc derivative precision
- lXC: Starting Local XC (DFT)
-nlXC: Starting non-Local XC (HF)
QP [eV] @ K [7] (iku): 0.000000 -0.500000 0.000000
B=8 Eo= -0.41 E= -0.98 E-Eo= -0.56 Re(Z)=0.81 Im(Z)=-.2368E-2 nlXC=-19.13 lXC=-16.11 So= 2.322
B=9 Eo= 3.88 E= 6.29 E-Eo= 2.41 Re(Z)=0.83 Im(Z)=-.2016E-2 nlXC=-5.536 lXC=-10.67 So=-2.232

Extended information can be also found in the output activating in the input the <code>[[Variables#ExtendOut|ExtendOut]]</code> flag. This is an optional flag that can be activated by adding the <code>-V qp</code> verbosity option when building the input file. The plasmon pole screening, exchange self-energy and the quasiparticle energies are also saved in databases so that they can be reused in further runs:

$ ls ./10b_1Ry
ndb.pp ndb.pp_fragment_1 ... ndb.HF_and_locXC ndb.QP

===Convergence tests for a quasi particle calculation===

Now we can check the convergence of the different variables entering in the expression of the correlation part of the self energy. 
First, we focus on the parameter governing the ''screening matrix'' you have already seen in the [[RPA/IP]] section. In contrast to the calculation of the [[RPA/IP]] dielectric function, where you considered either the optical limit or a finite q response (EELS), here the dielectric matrix will be calculated for ''all'' q-points determined by the choice of k-point sampling.

The parameters that need to be converged can be understood by looking at expression of the dielectric matrix:
[[File:Yambo-CH5.png|none|x30px|Yambo tutorial image]]
where ''χGG''' is given by
[[File:Dyson_rpa.png|none|x40px|Yambo tutorial image]]
and χ0GG' is given by
[[File:ChiO.png|none|x45px|Yambo tutorial image]]
* <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> : The dimension of the microscopic inverse matrix, related to [[Local fields]]
* <code>[[Variables#BndsRnXp|BndsRnXp]]</code> : The sum on bands (c,v) in the independent particle χ0GG'

====Converging Screening Parameters====
Here we will check the convergence of the gap starting from the variables controlling the screening reported above: the bands employed to build the RPA response function <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and the number of blocks (G,G') of the dielectric matrix ε-1G,G' <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
In the next section we will study convergence with respect to the sum over states summation (sum over ''m'' in the Σc expression); here let's fix <code>[[Variables#GbndRnge|GbndRnge]]</code> to a reasonable value (40 Ry).

Let's build a series of input files differing by the values of bands and block sizes in χ0GG' considering <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in the range 10-50 (upper limit) and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> in the range 1000 to 5000 mRy. To do this by hand, file by file, open the ''gw_ppa.in'' file in an editor and change to:
[[Variables#NGsBlkXp|NGsBlkXp]] = '''2000 mRy'''
while leaving the rest untouched. Repeat for 3000 mRy, 4000 mRy etc. Next, for each <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> change to:
% [[Variables#BndsRnXp|BndsRnXp]]
1 | 20 | # [Xp] Polarization function bands
%
and repeat for 30, 40 and so on. Give a '''different name''' to each file: ''gw_ppa_Xb_YRy.in'' with X=10,20,30,40 and Y=1000,2000,3000,4000,5000 mRy.

This is obviously quite tedious. However, you can automate both the input construction and code execution using bash or python scripts (indeed later you will learn how to use the the yambo-python [http://www.yambo-code.org/wiki/index.php?title=GW_tutorial._Convergence_and_approximations_(BN)]tool for this task). For now, you can use the simple [[bash_scripts|generate_inputs_1.sh]] bash script to generate the required input files (after copying the script you need to type <code>$ chmod +x name_of_the_script.sh </code>).

Finally launch the calculations:

$ yambo -F gw_ppa_10b_1Ry.in -J 10b_1Ry
$ yambo -F gw_ppa_10b_2Ry.in -J 10b_2Ry
...
$ yambo -F gw_ppa_40b_5Ry.in -J 40b_5Ry

Once the jobs are terminated we can collect the quasiparticle energies for fixed <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in different files named e.g. ''10b.dat, 20b.dat'' etc. for plotting, by putting in separate columns: the energy cutoff; the size of the G blocks; the quasiparticle energy of the valence band; and that of the conduction band.
To do this e.g. for the ''10b.dat'' file you can type:
$ cat o-10b* | grep "7 * 8" | awk '{print $3+$4}'
to parse the valence band quasiparticle energies and
$ cat o-10b* | grep "7 * 9" | awk '{print $3+$4}'
for the conduction band; and put all the data in the ''10b.dat'' files. As there are many files to process you can use the [[bash_scripts|parse_qps.sh]] script to create the ''10b.dat'' file and edit the script changing the number of <code>[[Variables#BndsRnXp|BndsRnXp]]</code> for the other output files.

Once we have collected all the quasiparticle values we can plot the gap, or the valence and conduction band energies separately, as a function of the block size or energy cutoff:
$ gnuplot
gnuplot> p "10b.dat" u 1:3 w lp t " Valence BndsRnXp=10", "20b.dat" u 1:3 w lp t "Valence BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:4 w lp t " Conduction BndsRnXp=10", "20b.dat" u 1:4 w lp t "Conduction BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:($4-$3) w lp t " Gap BndsRnXp=10", "20b.dat" u 1:($4-$3) w lp t "gap BndsRnXp=20",..
or both using e.g. the [[gnuplot_scripts|ppa_gap.gnu]] gnuplot script:

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect screening parameters">
File:ppa2.png|Valence band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
File:ppa3.png|Conduction band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
</gallery>

Looking at the plot we can see that:
* The two parameters are not totally independent (see e.g. the valence quasiparticle convergence) and this is the reason why we converged them simultaneously
* The gap (energy difference) converge faster than single quasiparticle state
* The convergence criteria depends on the degree of accuracy we look for, but considering the approximations behind the calculations (plasmon-pole etc.), it is not always a good idea to enforce too strict a criteria.
* Even if not totally converged we can consider that the upper limit of <code>[[Variables#BndsRnXp|BndsRnXp]]</code>=30, and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=3Ry are reasonable parameters.

Finally notice that in the last versions of Yambo (from 4.5 etc.) there is a terminator for the dielectric constant that accelerates convergence on the number of bands,
you can active it with the <code>-V resp</code> verbosity and setting <code>XTermKind= "BG"</code>. You can compare the converge of the dielectric constant with and without terminator,
similar to what happens for the self-energy, see next section.

====Converging the sum over states in the correlation self-energy====
From now on we will keep fixed these parameters and will perform a convergence study on the sum over state summation in the correlation self-energy (Σc) <code>[[Variables#GbndRnge|GbndRnge]]</code>.

In order to use the screening previously calculated we can copy the plasmon pole parameters saved in the <code>30b_3Ry</code> directory in the <code>SAVE</code> directory. In this way the screening will be read by Yambo and not calculated again:

$ cp ./30b_3Ry/ndb.pp* ./SAVE/.

(Note: you may have to delete these files before running the BSE tutorials)

In order to use the databases we have to be sure to have the same plasmon-pole parameters in our input files.
Edit ''gw_ppa_30b_3Ry.in'' and modify <code>[[Variables#GbndRnge|GbndRnge]]</code> to order to have a number of bands in the range from 10 to 80 inside different files named ''gw_ppa_Gbnd10.in'', ''gw_ppa_Gbnd20.in'' etc. You can also run the the [[bash_scripts|generate_inputs_2.sh]] bash script to generate the required input files.

Next, launch yambo for each input:

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20
...

and as done before we can inspect the obtained quasiparticle energies:

$ grep "7 * 8" o-Gbnd* | awk '{print $4+$5}'
for the valence bands, and
$ grep "7 * 9" o-Gbnd* | awk '{print $4+$5}'
for the conduction band.

Collect the results in a text file ''Gbnd_conv.dat'' containing: the number of Gbnd, the valence energy, and the conduction energy.
Now, as done before we can plot the valence and conduction quasiparticle levels separately as well as the gap, as a function of the number of bands used in the summation:

$ gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp lt 7 t "Valence"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp lt 7 t "Conduction"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp lt 7 t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect sum over states in correlation self-energy">
File:Gbnd_val.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>.
File:Gbnd_cond.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
File:Gbnd_gap.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
</gallery>

Inspecting the plot we can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is rather slow and many bands are needed to get converged results.
* As observed above the gap (energy difference) converges faster than the single quasiparticle state energies.

====Accelerating the sum over states convergence in the correlation self-energy====
In general the convergence with respect the sum over states can be very cumbersome. Here we show how it can be mitigated by using
a technique developed by F. Bruneval and X. Gonze <ref name="BG"> F. Bruneval and X. Gonze, Physical Review B 78, 085125 (2008 )</ref>. The basic idea relies in replacing the eigenenergies of the states that are not treated explicitly by a common energy, and take into account all the states, which are not explicitly included in the calculation through the closure relation.
To apply this technique in Yambo we need to activate the optional terminator variable [[Variables#GTermKind|GTermKind]]. You can edit the input file by setting:

[[Variables#GTermKind|GTermKind]]= "BG"

in the file ''gw_ppa_Gbnd10.in'' or simply you can modify by hand this line in all the other ''gw_ppa_GbndX.in'' input files.
Now we can repeat the same calculations

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10_term
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20_term
...

and collect the new results:

$ grep "7 * 8" *term.qp | awk '{print $4+$5}'
$ grep "7 * 9" *term.qp | awk '{print $4+$5}'

in a new file called ''Gbnd_conv_terminator.dat''. Now we can plot the same quantities as before by looking at the effect of having introduced the terminator.

gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp t "Valence", "Gbnd_conv_terminator.dat" u 1:2 w lp t "Valence with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp t "Conduction", "Gbnd_conv_terminator.dat" u 1:3 w lp t "Conduction with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp t "Gap", "Gbnd_conv_terminator.dat" u 1:($3-$2) w lp t "Gap with terminator"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Effect of the terminator in convergence of QP energies with respect sum over states">
File:val_t.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique <ref name="BG" />
File:cond_t.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
File:gap_t.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
</gallery>

We can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is accelerated in particular for the single quasiparticle states.
* From the plot above we can see that <code>[[Variables#GbndRange|GbndRange]]</code>=40 is sufficient to have a converged gap

===Beyond the plasmon pole approximation: a full frequency approach - real axis integration===
All the calculations performed up to now were based on the plasmon pole approximation (PPA). Now we remove this approximation by evaluating numerically the frequency integral in the expression of the correlation self-energy (Σc). To this aim we need to evaluate the screening for a number of frequencies in the interval depending on the electron-hole energy difference (energy difference between empty and occupied states) entering in the sum over states.
Let's build the input file for a full frequency calculation by simply typing:

$ yambo -F gw_ff.in -g n -p r

and we can set the variable studied up to now at their converged value:

%[[Variables#BndsRnXd|BndsRnXd]]
1 | 30 |
%
[[Variables#NGsBlkXd|NGsBlkXd]]=3000 mRy
%[[Variables#GbndRange|GbndRange]]
1 | 40 |
%
[[Variables#EXXRLvcs|EXXRLvcs]]=40000 mRy
% [[Variables#LongDrXd|LongDrXd]]
1.000000 | 1.000000 | 1.000000 | # [Xd] [cc] Electric Field
%
%QPkrange # # [GW] QP generalized Kpoint/Band indices
7|7|8|9|
%

and next vary in different files (name them ''gw_ff10.in'' etc.) the number of frequencies we evaluate for the screened coulomb potential. This is done by setting the values of [[Variables#ETStpsXd|ETStpsXd]]=10 , 50 , 100, 150, 200, 250.
You can also run the the [[bash_scripts|generate_inputs_3.sh]] bash script to generate the required input files.

Next launch yambo:
$ yambo -F gw_ff10.in -J ff10
$ yambo -F gw_ff50.in -J ff50
...

Clearly as we are evaluating the screening for a large number of frequencies these calculations will be heavier than the case above where the calculation of the screening was done for only two frequencies (zero and plasmon-pole).
As before, collect the valence and conduction bands as a function of the number of frequencies in a file called ''gw_ff.dat'' and plot the behaviour of the conduction and valence bands and the gap.

$ gnuplot
gnuplot> p "gw_ff.dat" u 1:2 w lp t "Valence"
gnuplot> p "gw_ff.dat" u 1:3 w lp t "Conduction"
gnuplot> p "gw_ff.dat" u 1:($3-$2) w lp t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Real axis integration, convergences with respect the used number of frequencies">
File:ff_v.png|Valence band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_c.png|Conduction band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_g.png|Gap energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
</gallery>

We can see that:

* Oscillations are still present, indicating that even more frequencies have to be considered. In general, a real-axis calculation is very demanding.
* The final result of the gap obtained up to now does not differ too much from the one obtained at the plasmon-pole level (~50 meV)

As the real-axis calculation is computational demanding, in the last years it has been developed in Yambo an alternative method called Multipole approximation (MPA). The MPA can be seen as an extension of the PPA, where you take a finite number of poles instead of just one. It has been seen that even a small number of poles grants a MPA on par with a full real axis calculation. For your reference, at [[Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)|https://www.yambo-code.eu/wiki/index.php/Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)]] there is a tutorial on the MPA method. For now, however, let us pass over to the next section.

===Secant Solver===

The real axis integration permits also to go beyond the linear expansion to solve the quasi particle equation.
The QP equation is a non-linear equation whose solution must be found using a suitable numerical algorithm.
[[File:Eqp_sec.png|none|x22px|caption]]
The mostly used, based on the linearization of the self-energy operator is the Newton method that is the one we have used up to now.
Yambo can also perform a search of the QP energies using a non-linear iterative method based on the [https://en.wikipedia.org/wiki/Secant_method Secant iterative Method].

In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite difference approximation of Newton's method.
The equation that defines the secant method is:

[[File:secant_eq.png|none|x35px|caption]]

The first two iterations of the secant method are shown in the following picture. The red curve shows the function f and the blue lines are the secants.

[[File:Secant_method.png|center|250px|caption]]

To see if there is any non-linear effect in the solution of the Dyson equation we compare the result of the calculation using the Newton solver as done before with the present case.
In order to use the secant method you need to edit one of the the previous ''gw_ffN.in'' files e.g. ''gw_ff100.in'' and substitute:

DysSolver= "g"
to
DysSolver= "s"

Repeat the calculations in the same way as before:

$ yambo -F gw_ff100.in -J ff100

Note than now the screening will ''not'' be calculated again as it has been stored in the ''ffN'' directories as can be seen in the report file:

[05] Dynamical Dielectric Matrix
================================
[RD./ff10//ndb.em1d]----
...
[RD./ff10//ndb.em1d_fragment_1]--------------
...

Comparing the output files, e.g. for the case with 100 freqs:

''o-ff100.qp'' '''Newton Solver:'''
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08708 2.91254
7 9 3.877976 1.421968 -3.417357

''o-ff100.qp_01'' '''Secant Solver:'''
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08715 2.93518
7 9 3.877976 1.401408 -3.731649

From the comparison, we see that the effect is of the order of 20 meV, of the same order of magnitude of the accuracy of the GW calculations.

==Step 3: Interpolating Band Structures==
Up to now we have checked convergence for the gap. Now we want to calculate the quasiparticle corrections across the Brillouin zone in order to visualize the entire band structure along a path connecting high symmetry points.

To do that we start by calculating the QP correction in the plasmon-pole approximation for all the k points of our sampling and for a number of bands around the gap. You can use a previous input file or generate a new one: <code> yambo -F gw_ppa_all_Bz.in -x -p p -g n </code> and set the parameters found in the previous tests:

EXXRLvcs= 40 Ry
% BndsRnXp
1 | 30 | # [Xp] Polarization function bands
%
NGsBlkXp= 3000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 40 | # [GW] G[W] bands range
%

and we calculate it for all the available kpoints by setting:
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 14| 6|11|
%

Note that as we have already calculated the screening with these parameters you can save time and use that database either by running in the previous directory by using <code> -J 30b_3Ry </code> or if you prefer to put the new databases in the new all_Bz directory you can create a new directory and copy there the screening databases:

$ mkdir all_Bz
$ cp ./30b_3Ry/ndb.pp* ./all_Bz/

and launch the calculation:

$ yambo -F gw_ppa_all_Bz.in -J all_Bz

Now we can inspect the output and see that it contains the correction for all the k points for the bands we asked:

# K-point Band Eo E-Eo Sc|Eo
#
1.000000 6.000000 -1.299712 -0.219100 3.788044
1.000000 7.000000 -1.296430 -0.241496 3.788092
1.000000 8.000000 -1.296420 -0.243115 3.785947
1.000000 9.000000 4.832399 0.952386 -3.679259
1.00000 10.00000 10.76416 2.09915 -4.38743
1.00000 11.00000 11.36167 2.48053 -3.91021
....
By plotting some of the 'o-all_Bz.qp" columns it is possible to discuss some physical properties of the hBN QPs. Using columns 3 and (3+4), ie plotting the GW energies with respect to the LDA energies we can deduce the band gap renormalization and the stretching of the conduction/valence bands:

$ gnuplot
gnuplot> p "o-all_Bz.qp" u 3:($3+$4) w p t "Eqp vs Elda"

[[File:EqpvE0.png|center|350px|caption]]

Essentially we can see that the effect of the GW self-energy is the opening of the gap and a linear stretching of the conduction/valence bands that can be estimated by performing a linear fit of the positive and negative energies (the zero is set at top of the valence band).

In order to calculate the band structure, however, we need to interpolate the values we have calculated above on a given path. In Yambo the interpolation is done by the executable <code>ypp</code> (Yambo Post Processing).
By typing:
$ ypp -h
you will recognize that in order to interpolate the bands we need to build a ypp input file using
$ ypp -s b

we can generate the input for the band interpolation:

$ypp -s b -F ypp_bands.in

and edit the ''ypp_bands.in'' file:

electrons # [R] Electrons (and holes)
bnds # [R] Bands
INTERP_mode= "BOLTZ" # Interpolation mode (NN=nearest point, BOLTZ=boltztrap aproach)
OutputAlat= 4.716000 # [a.u.] Lattice constant used for "alat" ouput format
cooIn= "rlu" # Points coordinates (in) cc/rlu/iku/alat
cooOut= "rlu"
% BANDS_bands
1 | 100 | # Number of bands
%
INTERP_Shell_Fac= 20.00000 # The bigger it is a higher number of shells is used
CIRCUIT_E_DB_path= "none" # SAVE obtained from the QE `bands` run (alternative to %BANDS_kpts)
BANDS_path= "" # BANDS path points labels (G,M,K,L...)
BANDS_steps= 10
#BANDS_built_in # Print the bands of the generating points of the circuit using the nearest internal point
%BANDS_kpts
%

We modify the following lines:
BANDS_steps=30
% BANDS_bands
6 | 11 | # Number of bands
%
%BANDS_kpts # K points of the bands circuit
0.33300 |-.66667 |0.00000 |
0.00000 |0.00000 |0.00000 |
0.50000 |-.50000 |0.00000 |
0.33300 |-.66667 |0.00000 |
0.33300 |-.66667 |0.50000 |
0.00000 |0.00000 |0.50000 |
0.50000 |-.50000 |0.50000 |
%

which means we assign 30 points in each segment, we ask to interpolate 3 occupied and 3 empty bands and we assign the following path passing from the high symmetry points: K Γ M K H A L.
Launching:
$ ypp -F ypp_bands.in
will produce the output file ''o.bands_interpolated'' containing:

#
# |k| b6 b7 b8 b9 b10 b11 kx ky kz
#
#
0.00000 -7.22092 -0.13402 -0.13395 4.67691 4.67694 10.08905 0.33300 -0.66667 0.00000
0.03725 -7.18857 -0.17190 -0.12684 4.66126 4.71050 10.12529 0.32190 -0.64445 0.00000
...

and we can plot the bands using gnuplot:
$ gnuplot
gnuplot> p "o.bands_interpolated" u 0:2 w l, "o.bands_interpolated" u 0:3 w l, ...

[[File:bands_lda.png|center|350px|caption]]

and you can recognize the index of the high symmetry point by inspecting the last three columns.
Note that up to now we have interpolated the LDA band structure. In order to plot the GW band structure, we need to tell <code>ypp</code> in the input file where the ''ndb.QP'' database is found. This is achieved by adding in the ''ypp_bands.in'' file the line:

GfnQPdb= "E < ./all_Bz/ndb.QP"

and relaunch

$ ypp -F ypp_bands.in

Now the file ''o.bands_interpolated_01'' contains the GW interpolated band structure. We can plot the LDA and GW band structure together by using the gnuplot script [[gnuplot_scripts|bands.gnu]].

<gallery mode=nolines widths=500px heights=500px perrow=2 caption="Band strcuture of bulk hBN">
File:hBN_bands.png| LDA and GW bands structure
File:hBN_bands_lit.png| LDA and GW bands structure from Ref. <ref name="Arnaud" />
</gallery>

*As expected the effect of the GW correction is to open the gap.
*Comparing the obtained band structure with the one found in the literature by Arnaud and coworkers <ref name="Arnaud"> B. Arnaud, S. Lebegue,P. Rabiller, and M. Alouani Phys, Rev. Lett. 96, 026402 (2006)</ref> we found a very nice qualitative agreement.
*Quantitatively we found a smaller gap: about 5.2 eV (indirect gap), 5.7 eV (direct gap) while in Ref.<ref name="Arnaud" /> is found 5.95 eV for the indirect gap and a minimum direct band gap of 6.47 eV. Other values are also reported in the literature depending on the used pseudopotentials, starting functional and type of self-consistency (see below).
*The present tutorial has been done with a small k point grid which is an important parameter to be checked, so convergence with respect the k point sampling has to be validated.

==Step 4: Summary of the convergence parameters==
We have calculated the band structure of hBN starting from a DFT calculation, here we summarize the main variable we have checked to achieve convergence:

* <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> # [XX] Exchange RL components
Number of G-vectors in the exchange. This number should be checked carefully. Generally a large number is needed as the QP energies show a slow convergence. The calcualtion of the exchange part is rather fast.

*<code>[[Variables#BndsRnXp|BndsRnXp]]</code> #[Xp] Polarization function bands
Number of bands in the independent response function form which the dielectric matrix is calculated. Also this paramater has to be checked carefully,together with NGsBlkXp as the two variables are interconnected

*<code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> # [Xp] Response block size
Number of G-vectors block in the dielectric constant. Also this paramater has to be checked carefully, to be checked together with BndsRnXp. A large number of bands and block can make the calculation very demanding.

*<code>[[Variables#LongDrXp|LongDrXp]] </code> # [Xp] [cc] Electric Field
Direction of the electric field for the calculation of the q=0 component of the dielectric constant e(q,w). In a bulk can be set to (1,1,1), attention must be paid for non 3D systems.

*<code>[[Variables#PPAPntXp|PPAPntXp]] </code> # [Xp] Plasmon pole imaginary energy: this is the second frequency used to fit the Godby-Needs plasmon-pole model (PPM). If results depend consistently by changing this frequency, the PPM is not adequate for your calculation and it is need to gp beyond that, e.g. Real-axis.

*<code>[[Variables#GbndRnge|GbndRnge]] </code> # [GW] G[W] bands range
Number of bands used to expand the Green's function. This number is usually larger than the number of bands used to calculated the dielectricconstant. Single quasiparticle energies converge slowly with respect GbndRnge, energy difference behave better. You can use terminator technique to mitigate the slow dependence.

*<code>[[Variables#GDamping|GDamping]] </code> # [GW] G[W] damping
Small damping in the Green's function definition, the delta
parameter. The final result shouuld not depend on that, usually set at 0.1 eV

*<code>[[Variables#dScStep|dScStep]] </code> # [GW]
Energy step to evaluate Z factors

*<code>[[Variables#DysSolver|DysSolver]] </code> # [GW] Dyson Equation solver ("n","s","g")
Parameters related to the solution of the Dyson equation, "n" Newton linearization, 's' non linear secant method

*<code>[[Variables#GTermKind|GTermKind]] </code> [GW] GW terminator
Terminator for the self-energy<ref name="BG" /> . We have seen how this spped up the convergence with respect empty bands.

*<code>[[Variables#%QPkrange |QPkrange ]] </code> # [GW] QP generalized Kpoint/Band indices
K-points and band range where you want to calculate the GW correction. The syntax is
first kpoint | last kpoint | first band | last band

==References==

GW on h-BN (standalone)

2025-05-18T13:23:08Z

Giacomo.sesti:

This is a modified version of the full tutorial on GW computations present on the Yambo wiki. Later, you can have a look at the extended version at [[How to obtain the quasi-particle band structure of a bulk material: h-BN]]

In this tutorial you will learn how to:
* Calculate quasi-particle corrections in the Hartree-Fock approximation
* Calculate quasi-particle corrections in the GW approximation
* Choose the input parameters for a meaningful converged calculation
* Plot a band structure including quasi-particle corrections
We will use bulk hBN as an example system. Before starting, you need to obtain the appropriate tarball. See instructions on the [[Tutorials|main tutorials page]]. 
We strongly recommend that you first complete the [[First steps: a walk through from DFT to optical properties]] tutorial.



The aim of the present tutorial is to obtain quasiparticle correction to energy levels using many-body perturbation theory (MBPT). 
The general non-linear quasiparticle equation reads:
[[File:Eqp_1.png|none|x26px|caption]]
As a first step we want to evaluate the self energy Σ entering in the quasiparticle equation. In the GW approach the self-energy can be separated into two components: a static term called the exchange self-energy (Σx), and a dynamical term (energy dependent) called the correlation self-energy (Σc):
[[File:Sigma.png|none|x25px|caption]]
We will treat these two terms separately and demonstrate how to set the most important variables for calculating each term.
In practice we will compute the quasi-particle corrections to the one particle Kohn-Sham eigenvalues obtained through a DFT calculation.

The steps are the following:

==Step 1: The Exchange Self Energy or HF quasi-particle correction==

We start by evaluating the exchange Self-Energy and the corresponding Quasiparticle energies (Hartree-Fock energies).
Follow the module on '''[[Hartree Fock]]''' and then return to this tutorial ''How to obtain the quasiparticle band structure of a bulk material: h-BN''.

==Step 2: The Correlation Self-Energy and Quasiparticle Energies==
Once we have calculated the exchange part, we next turn our attention to the more demanding dynamical part. The correlation part of the self-energy in a plane wave representation reads:
[[File:Sigma_c.png|none|x50px|caption]]
In the expression for the correlation self energy, we have (1) a summation over bands, (2) an integral over the Brillouin Zone, and (3) a sum over the G vectors. In contrast with the case of Σx, the summation over bands extends over ''all'' bands (including the unoccupied ones), and so convergence tests are needed. Another important difference is that the Coulomb interaction is now screened so a fundamental ingredient is the evaluation of the dynamical dielectric matrix. The expression for the dielectric matrix, calculated at the RPA level and including local field effects, has been already treated in the [[Local fields|Local fields]] tutorial.

In the following, we will see two ways to take into account the dynamical effects. First, we will see how to set the proper parameters to obtain a model dielectric function based on a widely used approximation, which models the energy dependence of each component of the dielectric matrix with a single pole function.
Secondly, we will see how to perform calculations by evaluating the dielectric matrix on a regular grid of frequencies.

Once the correlation part of the self-energy is calculated, we will check the convergence of the different parameters with respect to some final quantity, such as the gap.

After computing the frequency dependent self-energy, we will discover that in order to solve the quasiparticle equation we will need to know its value ''at the value of the quasiparticle itself''. In the following, unless explicitly stated, we will solve the non-linear quasi-particle equation at first order, by expanding the self-energy around the Kohn-Sham eigenvalue. In this way the quasiparticle equation reads:

[[File:Eqp_2.png|none|x26px|caption]]

where the normalization factor Z is defined as:

[[File:z_fac.png|none|x40px|caption]]

===The Plasmon Pole approximation===
As stated above, the basic idea of the plasmon-pole approximation is to approximate the frequency dependence of the dielectric matrix with a single pole function of the form:
[[File:ppa.png|none|x26px|caption]]
The two parameters RGG' and ΩGG' are obtained by a fit (for each component), after having calculated the RPA dielectric matrix at two given frequencies.
Yambo calculates the dielectric matrix in the static limit ( ω=0) and at a user defined frequency called the plasmon-pole frequency (ω=iωp).
Such an approximation has the big computational advantage of calculating the dielectric matrix for only two frequencies and leads to an analytical expression for the frequency integral of the correlation self-energy.
==== Input file generation ====
Let's start by building up the input file for a GW/PPA calculation, including the calculation of the exchange self-energy. From <code>yambo -H</code> you should understand that the correct option is <code>yambo -x -p p -g n</code>:

$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo -x -p p -g n -F gw_ppa.in

Let's modify the input file in the following way:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] GoWo Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
[[Variables#EXXRLvcs|EXXRLvcs]] = 40 Ry # [XX] Exchange RL components
[[Variables#VXCRLvcs|VXCRLvcs]] = 3187 RL # [XC] XCpotential RL components
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% BndsRnXp
1 | 10 | # [Xp] Polarization function bands
%
[[Variables#NGsBlkXp|NGsBlkXp]]= 1000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
[[Variables#PPAPntXp|PPAPntXp]] = 27.21138 eV # [Xp] PPA imaginary energy
%[[Variables#GbndRnge|GbndRnge]]
1 | 40 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
7| 7| 8| 9|
%

Brief explanation of some settings:
* Similar to the Hartree Fock study, we will concentrate on the convergence of the '''direct''' gap of the system. Hence we select the last occupied (8) and first unoccupied (9) bands for k-point number 7 in the <code>[[Variables#QPkrange|QPkrange]]</code> variable.
* We also keep <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> at its converged value of 40 Ry as obtained in the '''[[Hartree Fock]]''' tutorial.
* For the moment we keep fixed the plasmon pole energy <code>[[Variables#PPAPntXp|PPAPntXp]]</code> at its default value (=1 Hartree).
* We keep fixed the direction of the electric field for the evaluation of the dielectric matrix to a non-specific value: <code>[[Variables#LongDrXp|LongDrXp]]</code>=(1,1,1).
* Later we will study convergence with respect to <code>[[Variables#GbndRnge|GbndRnge]]</code>, <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>, and <code>[[Variables#BndsRnXp|BndsRnXp]]</code>; for now just set them to the values indicated.

==== Understanding the output ====
Let's look at the typical Yambo output. Run Yambo with an appropriate <code>-J</code> flag:

$ yambo -F gw_ppa.in -J 10b_1Ry

In the standard output you can recognise the different steps of the calculations: calculation of the screening matrix (evaluation of the non interacting and interacting response), calculation of the exchange self-energy, and finally the calculation of the correlation self-energy and quasiparticle energies. Moreover information on memory usage and execution time are reported:
...
<---> [05] Dynamic Dielectric Matrix (PPA)
...
<08s> Xo@q[3] |########################################| [100%] 03s(E) 03s(X)
<08s> X@q[3] |########################################| [100%] --(E) --(X)
...
<43s> [06] Bare local and non-local Exchange-Correlation
<43s> EXS |########################################| [100%] --(E) --(X)
...
<43s> [07] Dyson equation: Newton solver
<43s> [07.01] G0W0 (W PPA)
...
<45s> G0W0 (W PPA) |########################################| [100%] --(E) --(X)
...
<45s> [07.02] QP properties and I/O
<45s> [08] Game Over & Game summary

Let's have a look at the report and output. The output file ''o-10b_1Ry.qp'' contains (for each band and k-point that we indicated in the input file) the values of the bare KS eigenvalue, its GW correction and the correlation part of the self energy:
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.411876 -0.567723 2.322443
7 9 3.877976 2.413773 -2.232241

In the header you can see the details of the calculations, for instance it reports that <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=1 Ry corresponds to 5 Gvectors:

# X G`s [used]: 5

Other information can be found in the report file ''r-10b_1Ry_em1d_ppa_HF_and_locXC_gw0'', such as the renormalization factor defined above, the value of the ''exchange'' self-energy (non-local XC) and of the DFT exchange-correlation potential (local XC):

[07.02] QP properties and I/O
=============================
Legend (energies in eV):
- B : Band - Eo : bare energy
- E : QP energy - Z : Renormalization factor
- So : Sc(Eo) - S : Sc(E)
- dSp: Sc derivative precision
- lXC: Starting Local XC (DFT)
-nlXC: Starting non-Local XC (HF)
QP [eV] @ K [7] (iku): 0.000000 -0.500000 0.000000
B=8 Eo= -0.41 E= -0.98 E-Eo= -0.56 Re(Z)=0.81 Im(Z)=-.2368E-2 nlXC=-19.13 lXC=-16.11 So= 2.322
B=9 Eo= 3.88 E= 6.29 E-Eo= 2.41 Re(Z)=0.83 Im(Z)=-.2016E-2 nlXC=-5.536 lXC=-10.67 So=-2.232

Extended information can be also found in the output activating in the input the <code>[[Variables#ExtendOut|ExtendOut]]</code> flag. This is an optional flag that can be activated by adding the <code>-V qp</code> verbosity option when building the input file. The plasmon pole screening, exchange self-energy and the quasiparticle energies are also saved in databases so that they can be reused in further runs:

$ ls ./10b_1Ry
ndb.pp ndb.pp_fragment_1 ... ndb.HF_and_locXC ndb.QP

===Convergence tests for a quasi particle calculation===

Now we can check the convergence of the different variables entering in the expression of the correlation part of the self energy. 
First, we focus on the parameter governing the ''screening matrix'' you have already seen in the [[RPA/IP]] section. In contrast to the calculation of the [[RPA/IP]] dielectric function, where you considered either the optical limit or a finite q response (EELS), here the dielectric matrix will be calculated for ''all'' q-points determined by the choice of k-point sampling.

The parameters that need to be converged can be understood by looking at expression of the dielectric matrix:
[[File:Yambo-CH5.png|none|x30px|Yambo tutorial image]]
where ''χGG''' is given by
[[File:Dyson_rpa.png|none|x40px|Yambo tutorial image]]
and χ0GG' is given by
[[File:ChiO.png|none|x45px|Yambo tutorial image]]
* <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> : The dimension of the microscopic inverse matrix, related to [[Local fields]]
* <code>[[Variables#BndsRnXp|BndsRnXp]]</code> : The sum on bands (c,v) in the independent particle χ0GG'

====Converging Screening Parameters====
Here we will check the convergence of the gap starting from the variables controlling the screening reported above: the bands employed to build the RPA response function <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and the number of blocks (G,G') of the dielectric matrix ε-1G,G' <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
In the next section we will study convergence with respect to the sum over states summation (sum over ''m'' in the Σc expression); here let's fix <code>[[Variables#GbndRnge|GbndRnge]]</code> to a reasonable value (40 Ry).

Let's build a series of input files differing by the values of bands and block sizes in χ0GG' considering <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in the range 10-50 (upper limit) and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> in the range 1000 to 5000 mRy. To do this by hand, file by file, open the ''gw_ppa.in'' file in an editor and change to:
[[Variables#NGsBlkXp|NGsBlkXp]] = '''2000 mRy'''
while leaving the rest untouched. Repeat for 3000 mRy, 4000 mRy etc. Next, for each <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> change to:
% [[Variables#BndsRnXp|BndsRnXp]]
1 | 20 | # [Xp] Polarization function bands
%
and repeat for 30, 40 and so on. Give a '''different name''' to each file: ''gw_ppa_Xb_YRy.in'' with X=10,20,30,40 and Y=1000,2000,3000,4000,5000 mRy.

This is obviously quite tedious. However, you can automate both the input construction and code execution using bash or python scripts (indeed later you will learn how to use the the yambo-python [http://www.yambo-code.org/wiki/index.php?title=GW_tutorial._Convergence_and_approximations_(BN)]tool for this task). For now, you can use the simple [[bash_scripts|generate_inputs_1.sh]] bash script to generate the required input files (after copying the script you need to type <code>$ chmod +x name_of_the_script.sh </code>).

Finally launch the calculations:

$ yambo -F gw_ppa_10b_1Ry.in -J 10b_1Ry
$ yambo -F gw_ppa_10b_2Ry.in -J 10b_2Ry
...
$ yambo -F gw_ppa_40b_5Ry.in -J 40b_5Ry

Once the jobs are terminated we can collect the quasiparticle energies for fixed <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in different files named e.g. ''10b.dat, 20b.dat'' etc. for plotting, by putting in separate columns: the energy cutoff; the size of the G blocks; the quasiparticle energy of the valence band; and that of the conduction band.
To do this e.g. for the ''10b.dat'' file you can type:
$ cat o-10b* | grep "7 * 8" | awk '{print $3+$4}'
to parse the valence band quasiparticle energies and
$ cat o-10b* | grep "7 * 9" | awk '{print $3+$4}'
for the conduction band; and put all the data in the ''10b.dat'' files. As there are many files to process you can use the [[bash_scripts|parse_qps.sh]] script to create the ''10b.dat'' file and edit the script changing the number of <code>[[Variables#BndsRnXp|BndsRnXp]]</code> for the other output files.

Once we have collected all the quasiparticle values we can plot the gap, or the valence and conduction band energies separately, as a function of the block size or energy cutoff:
$ gnuplot
gnuplot> p "10b.dat" u 1:3 w lp t " Valence BndsRnXp=10", "20b.dat" u 1:3 w lp t "Valence BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:4 w lp t " Conduction BndsRnXp=10", "20b.dat" u 1:4 w lp t "Conduction BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:($4-$3) w lp t " Gap BndsRnXp=10", "20b.dat" u 1:($4-$3) w lp t "gap BndsRnXp=20",..
or both using e.g. the [[gnuplot_scripts|ppa_gap.gnu]] gnuplot script:

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect screening parameters">
File:ppa2.png|Valence band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
File:ppa3.png|Conduction band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
</gallery>

Looking at the plot we can see that:
* The two parameters are not totally independent (see e.g. the valence quasiparticle convergence) and this is the reason why we converged them simultaneously
* The gap (energy difference) converge faster than single quasiparticle state
* The convergence criteria depends on the degree of accuracy we look for, but considering the approximations behind the calculations (plasmon-pole etc.), it is not always a good idea to enforce too strict a criteria.
* Even if not totally converged we can consider that the upper limit of <code>[[Variables#BndsRnXp|BndsRnXp]]</code>=30, and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=3Ry are reasonable parameters.

Finally notice that in the last versions of Yambo (from 4.5 etc.) there is a terminator for the dielectric constant that accelerates convergence on the number of bands,
you can active it with the <code>-V resp</code> verbosity and setting <code>XTermKind= "BG"</code>. You can compare the converge of the dielectric constant with and without terminator,
similar to what happens for the self-energy, see next section.

====Converging the sum over states in the correlation self-energy====
From now on we will keep fixed these parameters and will perform a convergence study on the sum over state summation in the correlation self-energy (Σc) <code>[[Variables#GbndRnge|GbndRnge]]</code>.

In order to use the screening previously calculated we can copy the plasmon pole parameters saved in the <code>30b_3Ry</code> directory in the <code>SAVE</code> directory. In this way the screening will be read by Yambo and not calculated again:

$ cp ./30b_3Ry/ndb.pp* ./SAVE/.

(Note: you may have to delete these files before running the BSE tutorials)

In order to use the databases we have to be sure to have the same plasmon-pole parameters in our input files.
Edit ''gw_ppa_30b_3Ry.in'' and modify <code>[[Variables#GbndRnge|GbndRnge]]</code> to order to have a number of bands in the range from 10 to 80 inside different files named ''gw_ppa_Gbnd10.in'', ''gw_ppa_Gbnd20.in'' etc. You can also run the the [[bash_scripts|generate_inputs_2.sh]] bash script to generate the required input files.

Next, launch yambo for each input:

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20
...

and as done before we can inspect the obtained quasiparticle energies:

$ grep "7 * 8" o-Gbnd* | awk '{print $4+$5}'
for the valence bands, and
$ grep "7 * 9" o-Gbnd* | awk '{print $4+$5}'
for the conduction band.

Collect the results in a text file ''Gbnd_conv.dat'' containing: the number of Gbnd, the valence energy, and the conduction energy.
Now, as done before we can plot the valence and conduction quasiparticle levels separately as well as the gap, as a function of the number of bands used in the summation:

$ gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp lt 7 t "Valence"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp lt 7 t "Conduction"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp lt 7 t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect sum over states in correlation self-energy">
File:Gbnd_val.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>.
File:Gbnd_cond.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
File:Gbnd_gap.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
</gallery>

Inspecting the plot we can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is rather slow and many bands are needed to get converged results.
* As observed above the gap (energy difference) converges faster than the single quasiparticle state energies.

====Accelerating the sum over states convergence in the correlation self-energy====
In general the convergence with respect the sum over states can be very cumbersome. Here we show how it can be mitigated by using
a technique developed by F. Bruneval and X. Gonze <ref name="BG"> F. Bruneval and X. Gonze, Physical Review B 78, 085125 (2008 )</ref>. The basic idea relies in replacing the eigenenergies of the states that are not treated explicitly by a common energy, and take into account all the states, which are not explicitly included in the calculation through the closure relation.
To apply this technique in Yambo we need to activate the optional terminator variable [[Variables#GTermKind|GTermKind]]. We can activate it by adding the quasiparticle verbosity to the command line when generating the input file:

$ yambo -p p -g n -F gw_ppa_Gbnd10.in -V qp

and you can edit the input file by setting:

[[Variables#GTermKind|GTermKind]]= "BG"

or simply you can add by hand this line in all the other ''gw_ppa_GbndX.in'' input files.
Now we can repeat the same calculations

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10_term
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20_term
...

and collect the new results:

$ grep "7 * 8" *term.qp | awk '{print $4+$5}'
$ grep "7 * 9" *term.qp | awk '{print $4+$5}'

in a new file called ''Gbnd_conv_terminator.dat''. Now we can plot the same quantities as before by looking at the effect of having introduced the terminator.

gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp t "Valence", "Gbnd_conv_terminator.dat" u 1:2 w lp t "Valence with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp t "Conduction", "Gbnd_conv_terminator.dat" u 1:3 w lp t "Conduction with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp t "Gap", "Gbnd_conv_terminator.dat" u 1:($3-$2) w lp t "Gap with terminator"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Effect of the terminator in convergence of QP energies with respect sum over states">
File:val_t.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique <ref name="BG" />
File:cond_t.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
File:gap_t.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
</gallery>

We can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is accelerated in particular for the single quasiparticle states.
* From the plot above we can see that <code>[[Variables#GbndRange|GbndRange]]</code>=40 is sufficient to have a converged gap

===Beyond the plasmon pole approximation: a full frequency approach - real axis integration===
All the calculations performed up to now were based on the plasmon pole approximation (PPA). Now we remove this approximation by evaluating numerically the frequency integral in the expression of the correlation self-energy (Σc). To this aim we need to evaluate the screening for a number of frequencies in the interval depending on the electron-hole energy difference (energy difference between empty and occupied states) entering in the sum over states.
Let's build the input file for a full frequency calculation by simply typing:

$ yambo -F gw_ff.in -g n -p r

and we can set the variable studied up to now at their converged value:

%[[Variables#BndsRnXd|BndsRnXd]]
1 | 30 |
%
[[Variables#NGsBlkXd|NGsBlkXd]]=3000 mRy
%[[Variables#GbndRange|GbndRange]]
1 | 40 |
%
[[Variables#EXXRLvcs|EXXRLvcs]]=40000 mRy
% [[Variables#LongDrXd|LongDrXd]]
1.000000 | 1.000000 | 1.000000 | # [Xd] [cc] Electric Field
%
%QPkrange # # [GW] QP generalized Kpoint/Band indices
7|7|8|9|
%

and next vary in different files (name them ''gw_ff10.in'' etc.) the number of frequencies we evaluate for the screened coulomb potential. This is done by setting the values of [[Variables#ETStpsXd|ETStpsXd]]=10 , 50 , 100, 150, 200, 250.
You can also run the the [[bash_scripts|generate_inputs_3.sh]] bash script to generate the required input files.

Next launch yambo:
$ yambo -F gw_ff10.in -J ff10
$ yambo -F gw_ff50.in -J ff50
...

Clearly as we are evaluating the screening for a large number of frequencies these calculations will be heavier than the case above where the calculation of the screening was done for only two frequencies (zero and plasmon-pole).
As before, collect the valence and conduction bands as a function of the number of frequencies in a file called ''gw_ff.dat'' and plot the behaviour of the conduction and valence bands and the gap.

$ gnuplot
gnuplot> p "gw_ff.dat" u 1:2 w lp t "Valence"
gnuplot> p "gw_ff.dat" u 1:3 w lp t "Conduction"
gnuplot> p "gw_ff.dat" u 1:($3-$2) w lp t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Real axis integration, convergences with respect the used number of frequencies">
File:ff_v.png|Valence band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_c.png|Conduction band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_g.png|Gap energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
</gallery>

We can see that:

* Oscillations are still present, indicating that even more frequencies have to be considered. In general, a real-axis calculation is very demanding.
* The final result of the gap obtained up to now does not differ too much from the one obtained at the plasmon-pole level (~50 meV)

As the real-axis calculation is computational demanding, in the last years it has been developed in Yambo an alternative method called Multipole approximation (MPA). The MPA can be seen as an extension of the PPA, where you take a finite number of poles instead of just one. It has been seen that even a small number of poles grants a MPA on par with a full real axis calculation. For your reference, at [[Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)|https://www.yambo-code.eu/wiki/index.php/Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)]] there is a tutorial on the MPA method. For now, however, let us pass over to the next section.

===Secant Solver===

The real axis integration permits also to go beyond the linear expansion to solve the quasi particle equation.
The QP equation is a non-linear equation whose solution must be found using a suitable numerical algorithm.
[[File:Eqp_sec.png|none|x22px|caption]]
The mostly used, based on the linearization of the self-energy operator is the Newton method that is the one we have used up to now.
Yambo can also perform a search of the QP energies using a non-linear iterative method based on the [https://en.wikipedia.org/wiki/Secant_method Secant iterative Method].

In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite difference approximation of Newton's method.
The equation that defines the secant method is:

[[File:secant_eq.png|none|x35px|caption]]

The first two iterations of the secant method are shown in the following picture. The red curve shows the function f and the blue lines are the secants.

[[File:Secant_method.png|center|250px|caption]]

To see if there is any non-linear effect in the solution of the Dyson equation we compare the result of the calculation using the Newton solver as done before with the present case.
In order to use the secant method you need to edit one of the the previous ''gw_ffN.in'' files e.g. ''gw_ff100.in'' and substitute:

DysSolver= "g"
to
DysSolver= "s"

Repeat the calculations in the same way as before:

$ yambo -F gw_ff100.in -J ff100

Note than now the screening will ''not'' be calculated again as it has been stored in the ''ffN'' directories as can be seen in the report file:

[05] Dynamical Dielectric Matrix
================================
[RD./ff10//ndb.em1d]----
...
[RD./ff10//ndb.em1d_fragment_1]--------------
...

Comparing the output files, e.g. for the case with 100 freqs:

''o-ff100.qp'' '''Newton Solver:'''
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08708 2.91254
7 9 3.877976 1.421968 -3.417357

''o-ff100.qp_01'' '''Secant Solver:'''
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08715 2.93518
7 9 3.877976 1.401408 -3.731649

From the comparison, we see that the effect is of the order of 20 meV, of the same order of magnitude of the accuracy of the GW calculations.

==Step 3: Interpolating Band Structures==
Up to now we have checked convergence for the gap. Now we want to calculate the quasiparticle corrections across the Brillouin zone in order to visualize the entire band structure along a path connecting high symmetry points.

To do that we start by calculating the QP correction in the plasmon-pole approximation for all the k points of our sampling and for a number of bands around the gap. You can use a previous input file or generate a new one: <code> yambo -F gw_ppa_all_Bz.in -x -p p -g n </code> and set the parameters found in the previous tests:

EXXRLvcs= 40 Ry
% BndsRnXp
1 | 30 | # [Xp] Polarization function bands
%
NGsBlkXp= 3000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 40 | # [GW] G[W] bands range
%

and we calculate it for all the available kpoints by setting:
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 14| 6|11|
%

Note that as we have already calculated the screening with these parameters you can save time and use that database either by running in the previous directory by using <code> -J 30b_3Ry </code> or if you prefer to put the new databases in the new all_Bz directory you can create a new directory and copy there the screening databases:

$ mkdir all_Bz
$ cp ./30b_3Ry/ndb.pp* ./all_Bz/

and launch the calculation:

$ yambo -F gw_ppa_all_Bz.in -J all_Bz

Now we can inspect the output and see that it contains the correction for all the k points for the bands we asked:

# K-point Band Eo E-Eo Sc|Eo
#
1.000000 6.000000 -1.299712 -0.219100 3.788044
1.000000 7.000000 -1.296430 -0.241496 3.788092
1.000000 8.000000 -1.296420 -0.243115 3.785947
1.000000 9.000000 4.832399 0.952386 -3.679259
1.00000 10.00000 10.76416 2.09915 -4.38743
1.00000 11.00000 11.36167 2.48053 -3.91021
....
By plotting some of the 'o-all_Bz.qp" columns it is possible to discuss some physical properties of the hBN QPs. Using columns 3 and (3+4), ie plotting the GW energies with respect to the LDA energies we can deduce the band gap renormalization and the stretching of the conduction/valence bands:

$ gnuplot
gnuplot> p "o-all_Bz.qp" u 3:($3+$4) w p t "Eqp vs Elda"

[[File:EqpvE0.png|center|350px|caption]]

Essentially we can see that the effect of the GW self-energy is the opening of the gap and a linear stretching of the conduction/valence bands that can be estimated by performing a linear fit of the positive and negative energies (the zero is set at top of the valence band).

In order to calculate the band structure, however, we need to interpolate the values we have calculated above on a given path. In Yambo the interpolation is done by the executable <code>ypp</code> (Yambo Post Processing).
By typing:
$ ypp -h
you will recognize that in order to interpolate the bands we need to build a ypp input file using
$ ypp -s b

we can generate the input for the band interpolation:

$ypp -s b -F ypp_bands.in

and edit the ''ypp_bands.in'' file:

electrons # [R] Electrons (and holes)
bnds # [R] Bands
INTERP_mode= "BOLTZ" # Interpolation mode (NN=nearest point, BOLTZ=boltztrap aproach)
OutputAlat= 4.716000 # [a.u.] Lattice constant used for "alat" ouput format
cooIn= "rlu" # Points coordinates (in) cc/rlu/iku/alat
cooOut= "rlu"
% BANDS_bands
1 | 100 | # Number of bands
%
INTERP_Shell_Fac= 20.00000 # The bigger it is a higher number of shells is used
CIRCUIT_E_DB_path= "none" # SAVE obtained from the QE `bands` run (alternative to %BANDS_kpts)
BANDS_path= "" # BANDS path points labels (G,M,K,L...)
BANDS_steps= 10
#BANDS_built_in # Print the bands of the generating points of the circuit using the nearest internal point
%BANDS_kpts
%

We modify the following lines:
BANDS_steps=30
% BANDS_bands
6 | 11 | # Number of bands
%
%BANDS_kpts # K points of the bands circuit
0.33300 |-.66667 |0.00000 |
0.00000 |0.00000 |0.00000 |
0.50000 |-.50000 |0.00000 |
0.33300 |-.66667 |0.00000 |
0.33300 |-.66667 |0.50000 |
0.00000 |0.00000 |0.50000 |
0.50000 |-.50000 |0.50000 |
%

which means we assign 30 points in each segment, we ask to interpolate 3 occupied and 3 empty bands and we assign the following path passing from the high symmetry points: K Γ M K H A L.
Launching:
$ ypp -F ypp_bands.in
will produce the output file ''o.bands_interpolated'' containing:

#
# |k| b6 b7 b8 b9 b10 b11 kx ky kz
#
#
0.00000 -7.22092 -0.13402 -0.13395 4.67691 4.67694 10.08905 0.33300 -0.66667 0.00000
0.03725 -7.18857 -0.17190 -0.12684 4.66126 4.71050 10.12529 0.32190 -0.64445 0.00000
...

and we can plot the bands using gnuplot:
$ gnuplot
gnuplot> p "o.bands_interpolated" u 0:2 w l, "o.bands_interpolated" u 0:3 w l, ...

[[File:bands_lda.png|center|350px|caption]]

and you can recognize the index of the high symmetry point by inspecting the last three columns.
Note that up to now we have interpolated the LDA band structure. In order to plot the GW band structure, we need to tell <code>ypp</code> in the input file where the ''ndb.QP'' database is found. This is achieved by adding in the ''ypp_bands.in'' file the line:

GfnQPdb= "E < ./all_Bz/ndb.QP"

and relaunch

$ ypp -F ypp_bands.in

Now the file ''o.bands_interpolated_01'' contains the GW interpolated band structure. We can plot the LDA and GW band structure together by using the gnuplot script [[gnuplot_scripts|bands.gnu]].

<gallery mode=nolines widths=500px heights=500px perrow=2 caption="Band strcuture of bulk hBN">
File:hBN_bands.png| LDA and GW bands structure
File:hBN_bands_lit.png| LDA and GW bands structure from Ref. <ref name="Arnaud" />
</gallery>

*As expected the effect of the GW correction is to open the gap.
*Comparing the obtained band structure with the one found in the literature by Arnaud and coworkers <ref name="Arnaud"> B. Arnaud, S. Lebegue,P. Rabiller, and M. Alouani Phys, Rev. Lett. 96, 026402 (2006)</ref> we found a very nice qualitative agreement.
*Quantitatively we found a smaller gap: about 5.2 eV (indirect gap), 5.7 eV (direct gap) while in Ref.<ref name="Arnaud" /> is found 5.95 eV for the indirect gap and a minimum direct band gap of 6.47 eV. Other values are also reported in the literature depending on the used pseudopotentials, starting functional and type of self-consistency (see below).
*The present tutorial has been done with a small k point grid which is an important parameter to be checked, so convergence with respect the k point sampling has to be validated.

==Step 4: Summary of the convergence parameters==
We have calculated the band structure of hBN starting from a DFT calculation, here we summarize the main variable we have checked to achieve convergence:

* <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> # [XX] Exchange RL components
Number of G-vectors in the exchange. This number should be checked carefully. Generally a large number is needed as the QP energies show a slow convergence. The calcualtion of the exchange part is rather fast.

*<code>[[Variables#BndsRnXp|BndsRnXp]]</code> #[Xp] Polarization function bands
Number of bands in the independent response function form which the dielectric matrix is calculated. Also this paramater has to be checked carefully,together with NGsBlkXp as the two variables are interconnected

*<code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> # [Xp] Response block size
Number of G-vectors block in the dielectric constant. Also this paramater has to be checked carefully, to be checked together with BndsRnXp. A large number of bands and block can make the calculation very demanding.

*<code>[[Variables#LongDrXp|LongDrXp]] </code> # [Xp] [cc] Electric Field
Direction of the electric field for the calculation of the q=0 component of the dielectric constant e(q,w). In a bulk can be set to (1,1,1), attention must be paid for non 3D systems.

*<code>[[Variables#PPAPntXp|PPAPntXp]] </code> # [Xp] Plasmon pole imaginary energy: this is the second frequency used to fit the Godby-Needs plasmon-pole model (PPM). If results depend consistently by changing this frequency, the PPM is not adequate for your calculation and it is need to gp beyond that, e.g. Real-axis.

*<code>[[Variables#GbndRnge|GbndRnge]] </code> # [GW] G[W] bands range
Number of bands used to expand the Green's function. This number is usually larger than the number of bands used to calculated the dielectricconstant. Single quasiparticle energies converge slowly with respect GbndRnge, energy difference behave better. You can use terminator technique to mitigate the slow dependence.

*<code>[[Variables#GDamping|GDamping]] </code> # [GW] G[W] damping
Small damping in the Green's function definition, the delta
parameter. The final result shouuld not depend on that, usually set at 0.1 eV

*<code>[[Variables#dScStep|dScStep]] </code> # [GW]
Energy step to evaluate Z factors

*<code>[[Variables#DysSolver|DysSolver]] </code> # [GW] Dyson Equation solver ("n","s","g")
Parameters related to the solution of the Dyson equation, "n" Newton linearization, 's' non linear secant method

*<code>[[Variables#GTermKind|GTermKind]] </code> [GW] GW terminator
Terminator for the self-energy<ref name="BG" /> . We have seen how this spped up the convergence with respect empty bands.

*<code>[[Variables#%QPkrange |QPkrange ]] </code> # [GW] QP generalized Kpoint/Band indices
K-points and band range where you want to calculate the GW correction. The syntax is
first kpoint | last kpoint | first band | last band

==References==

GW on h-BN (standalone)

2025-05-18T13:21:31Z

Giacomo.sesti:

This is a modified version of the full tutorial on GW computations present on the Yambo wiki. Later, you can have a look at the extended version at [[How to obtain the quasi-particle band structure of a bulk material: h-BN]]

In this tutorial you will learn how to:
* Calculate quasi-particle corrections in the Hartree-Fock approximation
* Calculate quasi-particle corrections in the GW approximation
* Choose the input parameters for a meaningful converged calculation
* Plot a band structure including quasi-particle corrections
We will use bulk hBN as an example system. Before starting, you need to obtain the appropriate tarball. See instructions on the [[Tutorials|main tutorials page]]. 
We strongly recommend that you first complete the [[First steps: a walk through from DFT to optical properties]] tutorial.



The aim of the present tutorial is to obtain quasiparticle correction to energy levels using many-body perturbation theory (MBPT). 
The general non-linear quasiparticle equation reads:
[[File:Eqp_1.png|none|x26px|caption]]
As a first step we want to evaluate the self energy Σ entering in the quasiparticle equation. In the GW approach the self-energy can be separated into two components: a static term called the exchange self-energy (Σx), and a dynamical term (energy dependent) called the correlation self-energy (Σc):
[[File:Sigma.png|none|x25px|caption]]
We will treat these two terms separately and demonstrate how to set the most important variables for calculating each term.
In practice we will compute the quasi-particle corrections to the one particle Kohn-Sham eigenvalues obtained through a DFT calculation.

The steps are the following:

==Step 1: The Exchange Self Energy or HF quasi-particle correction==

We start by evaluating the exchange Self-Energy and the corresponding Quasiparticle energies (Hartree-Fock energies).
Follow the module on '''[[Hartree Fock]]''' and then return to this tutorial ''How to obtain the quasiparticle band structure of a bulk material: h-BN''.

==Step 2: The Correlation Self-Energy and Quasiparticle Energies==
Once we have calculated the exchange part, we next turn our attention to the more demanding dynamical part. The correlation part of the self-energy in a plane wave representation reads:
[[File:Sigma_c.png|none|x50px|caption]]
In the expression for the correlation self energy, we have (1) a summation over bands, (2) an integral over the Brillouin Zone, and (3) a sum over the G vectors. In contrast with the case of Σx, the summation over bands extends over ''all'' bands (including the unoccupied ones), and so convergence tests are needed. Another important difference is that the Coulomb interaction is now screened so a fundamental ingredient is the evaluation of the dynamical dielectric matrix. The expression for the dielectric matrix, calculated at the RPA level and including local field effects, has been already treated in the [[Local fields|Local fields]] tutorial.

In the following, we will see two ways to take into account the dynamical effects. First, we will see how to set the proper parameters to obtain a model dielectric function based on a widely used approximation, which models the energy dependence of each component of the dielectric matrix with a single pole function.
Secondly, we will see how to perform calculations by evaluating the dielectric matrix on a regular grid of frequencies.

Once the correlation part of the self-energy is calculated, we will check the convergence of the different parameters with respect to some final quantity, such as the gap.

After computing the frequency dependent self-energy, we will discover that in order to solve the quasiparticle equation we will need to know its value ''at the value of the quasiparticle itself''. In the following, unless explicitly stated, we will solve the non-linear quasi-particle equation at first order, by expanding the self-energy around the Kohn-Sham eigenvalue. In this way the quasiparticle equation reads:

[[File:Eqp_2.png|none|x26px|caption]]

where the normalization factor Z is defined as:

[[File:z_fac.png|none|x40px|caption]]

===The Plasmon Pole approximation===
As stated above, the basic idea of the plasmon-pole approximation is to approximate the frequency dependence of the dielectric matrix with a single pole function of the form:
[[File:ppa.png|none|x26px|caption]]
The two parameters RGG' and ΩGG' are obtained by a fit (for each component), after having calculated the RPA dielectric matrix at two given frequencies.
Yambo calculates the dielectric matrix in the static limit ( ω=0) and at a user defined frequency called the plasmon-pole frequency (ω=iωp).
Such an approximation has the big computational advantage of calculating the dielectric matrix for only two frequencies and leads to an analytical expression for the frequency integral of the correlation self-energy.
==== Input file generation ====
Let's start by building up the input file for a GW/PPA calculation, including the calculation of the exchange self-energy. From <code>yambo -H</code> you should understand that the correct option is <code>yambo -x -p p -g n</code>:

$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo -x -p p -g n -F gw_ppa.in

Let's modify the input file in the following way:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] GoWo Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
[[Variables#EXXRLvcs|EXXRLvcs]] = 40 Ry # [XX] Exchange RL components
[[Variables#VXCRLvcs|VXCRLvcs]] = 3187 RL # [XC] XCpotential RL components
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% BndsRnXp
1 | 10 | # [Xp] Polarization function bands
%
[[Variables#NGsBlkXp|NGsBlkXp]]= 1000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
[[Variables#PPAPntXp|PPAPntXp]] = 27.21138 eV # [Xp] PPA imaginary energy
%[[Variables#GbndRnge|GbndRnge]]
1 | 40 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
7| 7| 8| 9|
%

Brief explanation of some settings:
* Similar to the Hartree Fock study, we will concentrate on the convergence of the '''direct''' gap of the system. Hence we select the last occupied (8) and first unoccupied (9) bands for k-point number 7 in the <code>[[Variables#QPkrange|QPkrange]]</code> variable.
* We also keep <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> at its converged value of 40 Ry as obtained in the '''[[Hartree Fock]]''' tutorial.
* For the moment we keep fixed the plasmon pole energy <code>[[Variables#PPAPntXp|PPAPntXp]]</code> at its default value (=1 Hartree).
* We keep fixed the direction of the electric field for the evaluation of the dielectric matrix to a non-specific value: <code>[[Variables#LongDrXp|LongDrXp]]</code>=(1,1,1).
* Later we will study convergence with respect to <code>[[Variables#GbndRnge|GbndRnge]]</code>, <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>, and <code>[[Variables#BndsRnXp|BndsRnXp]]</code>; for now just set them to the values indicated.

==== Understanding the output ====
Let's look at the typical Yambo output. Run Yambo with an appropriate <code>-J</code> flag:

$ yambo -F gw_ppa.in -J 10b_1Ry

In the standard output you can recognise the different steps of the calculations: calculation of the screening matrix (evaluation of the non interacting and interacting response), calculation of the exchange self-energy, and finally the calculation of the correlation self-energy and quasiparticle energies. Moreover information on memory usage and execution time are reported:
...
<---> [05] Dynamic Dielectric Matrix (PPA)
...
<08s> Xo@q[3] |########################################| [100%] 03s(E) 03s(X)
<08s> X@q[3] |########################################| [100%] --(E) --(X)
...
<43s> [06] Bare local and non-local Exchange-Correlation
<43s> EXS |########################################| [100%] --(E) --(X)
...
<43s> [07] Dyson equation: Newton solver
<43s> [07.01] G0W0 (W PPA)
...
<45s> G0W0 (W PPA) |########################################| [100%] --(E) --(X)
...
<45s> [07.02] QP properties and I/O
<45s> [08] Game Over & Game summary

Let's have a look at the report and output. The output file ''o-10b_1Ry.qp'' contains (for each band and k-point that we indicated in the input file) the values of the bare KS eigenvalue, its GW correction and the correlation part of the self energy:
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.411876 -0.567723 2.322443
7 9 3.877976 2.413773 -2.232241

In the header you can see the details of the calculations, for instance it reports that <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=1 Ry corresponds to 5 Gvectors:

# X G`s [used]: 5

Other information can be found in the report file ''r-10b_1Ry_em1d_ppa_HF_and_locXC_gw0'', such as the renormalization factor defined above, the value of the ''exchange'' self-energy (non-local XC) and of the DFT exchange-correlation potential (local XC):

[07.02] QP properties and I/O
=============================
Legend (energies in eV):
- B : Band - Eo : bare energy
- E : QP energy - Z : Renormalization factor
- So : Sc(Eo) - S : Sc(E)
- dSp: Sc derivative precision
- lXC: Starting Local XC (DFT)
-nlXC: Starting non-Local XC (HF)
QP [eV] @ K [7] (iku): 0.000000 -0.500000 0.000000
B=8 Eo= -0.41 E= -0.98 E-Eo= -0.56 Re(Z)=0.81 Im(Z)=-.2368E-2 nlXC=-19.13 lXC=-16.11 So= 2.322
B=9 Eo= 3.88 E= 6.29 E-Eo= 2.41 Re(Z)=0.83 Im(Z)=-.2016E-2 nlXC=-5.536 lXC=-10.67 So=-2.232

Extended information can be also found in the output activating in the input the <code>[[Variables#ExtendOut|ExtendOut]]</code> flag. This is an optional flag that can be activated by adding the <code>-V qp</code> verbosity option when building the input file. The plasmon pole screening, exchange self-energy and the quasiparticle energies are also saved in databases so that they can be reused in further runs:

$ ls ./10b_1Ry
ndb.pp ndb.pp_fragment_1 ... ndb.HF_and_locXC ndb.QP

===Convergence tests for a quasi particle calculation===

Now we can check the convergence of the different variables entering in the expression of the correlation part of the self energy. 
First, we focus on the parameter governing the ''screening matrix'' you have already seen in the [[RPA/IP]] section. In contrast to the calculation of the [[RPA/IP]] dielectric function, where you considered either the optical limit or a finite q response (EELS), here the dielectric matrix will be calculated for ''all'' q-points determined by the choice of k-point sampling.

The parameters that need to be converged can be understood by looking at expression of the dielectric matrix:
[[File:Yambo-CH5.png|none|x30px|Yambo tutorial image]]
where ''χGG''' is given by
[[File:Dyson_rpa.png|none|x40px|Yambo tutorial image]]
and χ0GG' is given by
[[File:ChiO.png|none|x45px|Yambo tutorial image]]
* <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> : The dimension of the microscopic inverse matrix, related to [[Local fields]]
* <code>[[Variables#BndsRnXp|BndsRnXp]]</code> : The sum on bands (c,v) in the independent particle χ0GG'

====Converging Screening Parameters====
Here we will check the convergence of the gap starting from the variables controlling the screening reported above: the bands employed to build the RPA response function <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and the number of blocks (G,G') of the dielectric matrix ε-1G,G' <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
In the next section we will study convergence with respect to the sum over states summation (sum over ''m'' in the Σc expression); here let's fix <code>[[Variables#GbndRnge|GbndRnge]]</code> to a reasonable value (40 Ry).

Let's build a series of input files differing by the values of bands and block sizes in χ0GG' considering <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in the range 10-50 (upper limit) and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> in the range 1000 to 5000 mRy. To do this by hand, file by file, open the ''gw_ppa.in'' file in an editor and change to:
[[Variables#NGsBlkXp|NGsBlkXp]] = '''2000 mRy'''
while leaving the rest untouched. Repeat for 3000 mRy, 4000 mRy etc. Next, for each <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> change to:
% [[Variables#BndsRnXp|BndsRnXp]]
1 | 20 | # [Xp] Polarization function bands
%
and repeat for 30, 40 and so on. Give a '''different name''' to each file: ''gw_ppa_Xb_YRy.in'' with X=10,20,30,40 and Y=1000,2000,3000,4000,5000 mRy.

This is obviously quite tedious. However, you can automate both the input construction and code execution using bash or python scripts (indeed later you will learn how to use the the yambo-python [http://www.yambo-code.org/wiki/index.php?title=GW_tutorial._Convergence_and_approximations_(BN)]tool for this task). For now, you can use the simple [[bash_scripts|generate_inputs_1.sh]] bash script to generate the required input files (after copying the script you need to type <code>$ chmod +x name_of_the_script.sh </code>).

Finally launch the calculations:

$ yambo -F gw_ppa_10b_1Ry.in -J 10b_1Ry
$ yambo -F gw_ppa_10b_2Ry.in -J 10b_2Ry
...
$ yambo -F gw_ppa_40b_5Ry.in -J 40b_5Ry

Once the jobs are terminated we can collect the quasiparticle energies for fixed <code>[[Variables#BndsRnXp|BndsRnXp]]</code> in different files named e.g. ''10b.dat, 20b.dat'' etc. for plotting, by putting in separate columns: the energy cutoff; the size of the G blocks; the quasiparticle energy of the valence band; and that of the conduction band.
To do this e.g. for the ''10b.dat'' file you can type:
$ cat o-10b* | grep "8.000" | awk '{print $3+$4}'
to parse the valence band quasiparticle energies and
$ cat o-10b* | grep "9.000" | awk '{print $3+$4}'
for the conduction band; and put all the data in the ''10b.dat'' files. As there are many files to process you can use the [[bash_scripts|parse_qps.sh]] script to create the ''10b.dat'' file and edit the script changing the number of <code>[[Variables#BndsRnXp|BndsRnXp]]</code> for the other output files.

Once we have collected all the quasiparticle values we can plot the gap, or the valence and conduction band energies separately, as a function of the block size or energy cutoff:
$ gnuplot
gnuplot> p "10b.dat" u 1:3 w lp t " Valence BndsRnXp=10", "20b.dat" u 1:3 w lp t "Valence BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:4 w lp t " Conduction BndsRnXp=10", "20b.dat" u 1:4 w lp t "Conduction BndsRnXp=20",..
gnuplot> p "10b.dat" u 1:($4-$3) w lp t " Gap BndsRnXp=10", "20b.dat" u 1:($4-$3) w lp t "gap BndsRnXp=20",..
or both using e.g. the [[gnuplot_scripts|ppa_gap.gnu]] gnuplot script:

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect screening parameters">
File:ppa2.png|Valence band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
File:ppa3.png|Conduction band energy wrt <code>[[Variables#BndsRnXp|BndsRnXp]]</code> and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>.
</gallery>

Looking at the plot we can see that:
* The two parameters are not totally independent (see e.g. the valence quasiparticle convergence) and this is the reason why we converged them simultaneously
* The gap (energy difference) converge faster than single quasiparticle state
* The convergence criteria depends on the degree of accuracy we look for, but considering the approximations behind the calculations (plasmon-pole etc.), it is not always a good idea to enforce too strict a criteria.
* Even if not totally converged we can consider that the upper limit of <code>[[Variables#BndsRnXp|BndsRnXp]]</code>=30, and <code>[[Variables#NGsBlkXp|NGsBlkXp]]</code>=3Ry are reasonable parameters.

Finally notice that in the last versions of Yambo (from 4.5 etc.) there is a terminator for the dielectric constant that accelerates convergence on the number of bands,
you can active it with the <code>-V resp</code> verbosity and setting <code>XTermKind= "BG"</code>. You can compare the converge of the dielectric constant with and without terminator,
similar to what happens for the self-energy, see next section.

====Converging the sum over states in the correlation self-energy====
From now on we will keep fixed these parameters and will perform a convergence study on the sum over state summation in the correlation self-energy (Σc) <code>[[Variables#GbndRnge|GbndRnge]]</code>.

In order to use the screening previously calculated we can copy the plasmon pole parameters saved in the <code>30b_3Ry</code> directory in the <code>SAVE</code> directory. In this way the screening will be read by Yambo and not calculated again:

$ cp ./30b_3Ry/ndb.pp* ./SAVE/.

(Note: you may have to delete these files before running the BSE tutorials)

In order to use the databases we have to be sure to have the same plasmon-pole parameters in our input files.
Edit ''gw_ppa_30b_3Ry.in'' and modify <code>[[Variables#GbndRnge|GbndRnge]]</code> to order to have a number of bands in the range from 10 to 80 inside different files named ''gw_ppa_Gbnd10.in'', ''gw_ppa_Gbnd20.in'' etc. You can also run the the [[bash_scripts|generate_inputs_2.sh]] bash script to generate the required input files.

Next, launch yambo for each input:

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20
...

and as done before we can inspect the obtained quasiparticle energies:

$ grep "7 * 8" o-Gbnd* | awk '{print $4+$5}'
for the valence bands, and
$ grep "7 * 9" o-Gbnd* | awk '{print $4+$5}'
for the conduction band.

Collect the results in a text file ''Gbnd_conv.dat'' containing: the number of Gbnd, the valence energy, and the conduction energy.
Now, as done before we can plot the valence and conduction quasiparticle levels separately as well as the gap, as a function of the number of bands used in the summation:

$ gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp lt 7 t "Valence"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp lt 7 t "Conduction"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp lt 7 t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=1 caption="Quasiparticle energies with respect sum over states in correlation self-energy">
File:Gbnd_val.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>.
File:Gbnd_cond.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
File:Gbnd_gap.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code>
</gallery>

Inspecting the plot we can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is rather slow and many bands are needed to get converged results.
* As observed above the gap (energy difference) converges faster than the single quasiparticle state energies.

====Accelerating the sum over states convergence in the correlation self-energy====
In general the convergence with respect the sum over states can be very cumbersome. Here we show how it can be mitigated by using
a technique developed by F. Bruneval and X. Gonze <ref name="BG"> F. Bruneval and X. Gonze, Physical Review B 78, 085125 (2008 )</ref>. The basic idea relies in replacing the eigenenergies of the states that are not treated explicitly by a common energy, and take into account all the states, which are not explicitly included in the calculation through the closure relation.
To apply this technique in Yambo we need to activate the optional terminator variable [[Variables#GTermKind|GTermKind]]. We can activate it by adding the quasiparticle verbosity to the command line when generating the input file:

$ yambo -p p -g n -F gw_ppa_Gbnd10.in -V qp

and you can edit the input file by setting:

[[Variables#GTermKind|GTermKind]]= "BG"

or simply you can add by hand this line in all the other ''gw_ppa_GbndX.in'' input files.
Now we can repeat the same calculations

$ yambo -F gw_ppa_Gbnd10.in -J Gbnd10_term
$ yambo -F gw_ppa_Gbnd20.in -J Gbnd20_term
...

and collect the new results:

$ grep "7 * 8" *term.qp | awk '{print $4+$5}'
$ grep "7 * 9" *term.qp | awk '{print $4+$5}'

in a new file called ''Gbnd_conv_terminator.dat''. Now we can plot the same quantities as before by looking at the effect of having introduced the terminator.

gnuplot
gnuplot> p "Gbnd_conv.dat" u 1:2 w lp t "Valence", "Gbnd_conv_terminator.dat" u 1:2 w lp t "Valence with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:3 w lp t "Conduction", "Gbnd_conv_terminator.dat" u 1:3 w lp t "Conduction with terminator"
gnuplot> p "Gbnd_conv.dat" u 1:($3-$2) w lp t "Gap", "Gbnd_conv_terminator.dat" u 1:($3-$2) w lp t "Gap with terminator"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Effect of the terminator in convergence of QP energies with respect sum over states">
File:val_t.png|Valence band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique <ref name="BG" />
File:cond_t.png|Conduction band energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
File:gap_t.png|Gap energy wrt <code>[[Variables#GbndRange|GbndRange]]</code> with and without the terminator technique<ref name="BG" />
</gallery>

We can see that:
* The convergence with respect to <code>[[Variables#GbndRange|GbndRange]]</code> is accelerated in particular for the single quasiparticle states.
* From the plot above we can see that <code>[[Variables#GbndRange|GbndRange]]</code>=40 is sufficient to have a converged gap

===Beyond the plasmon pole approximation: a full frequency approach - real axis integration===
All the calculations performed up to now were based on the plasmon pole approximation (PPA). Now we remove this approximation by evaluating numerically the frequency integral in the expression of the correlation self-energy (Σc). To this aim we need to evaluate the screening for a number of frequencies in the interval depending on the electron-hole energy difference (energy difference between empty and occupied states) entering in the sum over states.
Let's build the input file for a full frequency calculation by simply typing:

$ yambo -F gw_ff.in -g n -p r

and we can set the variable studied up to now at their converged value:

%[[Variables#BndsRnXd|BndsRnXd]]
1 | 30 |
%
[[Variables#NGsBlkXd|NGsBlkXd]]=3000 mRy
%[[Variables#GbndRange|GbndRange]]
1 | 40 |
%
[[Variables#EXXRLvcs|EXXRLvcs]]=40000 mRy
% [[Variables#LongDrXd|LongDrXd]]
1.000000 | 1.000000 | 1.000000 | # [Xd] [cc] Electric Field
%
%QPkrange # # [GW] QP generalized Kpoint/Band indices
7|7|8|9|
%

and next vary in different files (name them ''gw_ff10.in'' etc.) the number of frequencies we evaluate for the screened coulomb potential. This is done by setting the values of [[Variables#ETStpsXd|ETStpsXd]]=10 , 50 , 100, 150, 200, 250.
You can also run the the [[bash_scripts|generate_inputs_3.sh]] bash script to generate the required input files.

Next launch yambo:
$ yambo -F gw_ff10.in -J ff10
$ yambo -F gw_ff50.in -J ff50
...

Clearly as we are evaluating the screening for a large number of frequencies these calculations will be heavier than the case above where the calculation of the screening was done for only two frequencies (zero and plasmon-pole).
As before, collect the valence and conduction bands as a function of the number of frequencies in a file called ''gw_ff.dat'' and plot the behaviour of the conduction and valence bands and the gap.

$ gnuplot
gnuplot> p "gw_ff.dat" u 1:2 w lp t "Valence"
gnuplot> p "gw_ff.dat" u 1:3 w lp t "Conduction"
gnuplot> p "gw_ff.dat" u 1:($3-$2) w lp t "Gap"

<gallery mode=nolines widths=500px heights=500px perrow=3 caption="Real axis integration, convergences with respect the used number of frequencies">
File:ff_v.png|Valence band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_c.png|Conduction band energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
File:ff_g.png|Gap energy wrt <code>[[Variables#ETStpsXd|ETStpsXd]]</code>
</gallery>

We can see that:

* Oscillations are still present, indicating that even more frequencies have to be considered. In general, a real-axis calculation is very demanding.
* The final result of the gap obtained up to now does not differ too much from the one obtained at the plasmon-pole level (~50 meV)

As the real-axis calculation is computational demanding, in the last years it has been developed in Yambo an alternative method called Multipole approximation (MPA). The MPA can be seen as an extension of the PPA, where you take a finite number of poles instead of just one. It has been seen that even a small number of poles grants a MPA on par with a full real axis calculation. For your reference, at [[Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)|https://www.yambo-code.eu/wiki/index.php/Quasi-particles_and_Self-energy_within_the_Multipole_Approximation_(MPA)]] there is a tutorial on the MPA method. For now, however, let us pass over to the next section.

===Secant Solver===

The real axis integration permits also to go beyond the linear expansion to solve the quasi particle equation.
The QP equation is a non-linear equation whose solution must be found using a suitable numerical algorithm.
[[File:Eqp_sec.png|none|x22px|caption]]
The mostly used, based on the linearization of the self-energy operator is the Newton method that is the one we have used up to now.
Yambo can also perform a search of the QP energies using a non-linear iterative method based on the [https://en.wikipedia.org/wiki/Secant_method Secant iterative Method].

In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function ''f''. The secant method can be thought of as a finite difference approximation of Newton's method.
The equation that defines the secant method is:

[[File:secant_eq.png|none|x35px|caption]]

The first two iterations of the secant method are shown in the following picture. The red curve shows the function f and the blue lines are the secants.

[[File:Secant_method.png|center|250px|caption]]

To see if there is any non-linear effect in the solution of the Dyson equation we compare the result of the calculation using the Newton solver as done before with the present case.
In order to use the secant method you need to edit one of the the previous ''gw_ffN.in'' files e.g. ''gw_ff100.in'' and substitute:

DysSolver= "g"
to
DysSolver= "s"

Repeat the calculations in the same way as before:

$ yambo -F gw_ff100.in -J ff100

Note than now the screening will ''not'' be calculated again as it has been stored in the ''ffN'' directories as can be seen in the report file:

[05] Dynamical Dielectric Matrix
================================
[RD./ff10//ndb.em1d]----
...
[RD./ff10//ndb.em1d_fragment_1]--------------
...

Comparing the output files, e.g. for the case with 100 freqs:

''o-ff100.qp'' '''Newton Solver:'''
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08708 2.91254
7 9 3.877976 1.421968 -3.417357

''o-ff100.qp_01'' '''Secant Solver:'''
#
# K-point Band Eo E-Eo Sc|Eo
#
7 8 -0.41188 -0.08715 2.93518
7 9 3.877976 1.401408 -3.731649

From the comparison, we see that the effect is of the order of 20 meV, of the same order of magnitude of the accuracy of the GW calculations.

==Step 3: Interpolating Band Structures==
Up to now we have checked convergence for the gap. Now we want to calculate the quasiparticle corrections across the Brillouin zone in order to visualize the entire band structure along a path connecting high symmetry points.

To do that we start by calculating the QP correction in the plasmon-pole approximation for all the k points of our sampling and for a number of bands around the gap. You can use a previous input file or generate a new one: <code> yambo -F gw_ppa_all_Bz.in -x -p p -g n </code> and set the parameters found in the previous tests:

EXXRLvcs= 40 Ry
% BndsRnXp
1 | 30 | # [Xp] Polarization function bands
%
NGsBlkXp= 3000 mRy # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 40 | # [GW] G[W] bands range
%

and we calculate it for all the available kpoints by setting:
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 14| 6|11|
%

Note that as we have already calculated the screening with these parameters you can save time and use that database either by running in the previous directory by using <code> -J 30b_3Ry </code> or if you prefer to put the new databases in the new all_Bz directory you can create a new directory and copy there the screening databases:

$ mkdir all_Bz
$ cp ./30b_3Ry/ndb.pp* ./all_Bz/

and launch the calculation:

$ yambo -F gw_ppa_all_Bz.in -J all_Bz

Now we can inspect the output and see that it contains the correction for all the k points for the bands we asked:

# K-point Band Eo E-Eo Sc|Eo
#
1.000000 6.000000 -1.299712 -0.219100 3.788044
1.000000 7.000000 -1.296430 -0.241496 3.788092
1.000000 8.000000 -1.296420 -0.243115 3.785947
1.000000 9.000000 4.832399 0.952386 -3.679259
1.00000 10.00000 10.76416 2.09915 -4.38743
1.00000 11.00000 11.36167 2.48053 -3.91021
....
By plotting some of the 'o-all_Bz.qp" columns it is possible to discuss some physical properties of the hBN QPs. Using columns 3 and (3+4), ie plotting the GW energies with respect to the LDA energies we can deduce the band gap renormalization and the stretching of the conduction/valence bands:

$ gnuplot
gnuplot> p "o-all_Bz.qp" u 3:($3+$4) w p t "Eqp vs Elda"

[[File:EqpvE0.png|center|350px|caption]]

Essentially we can see that the effect of the GW self-energy is the opening of the gap and a linear stretching of the conduction/valence bands that can be estimated by performing a linear fit of the positive and negative energies (the zero is set at top of the valence band).

In order to calculate the band structure, however, we need to interpolate the values we have calculated above on a given path. In Yambo the interpolation is done by the executable <code>ypp</code> (Yambo Post Processing).
By typing:
$ ypp -h
you will recognize that in order to interpolate the bands we need to build a ypp input file using
$ ypp -s b

we can generate the input for the band interpolation:

$ypp -s b -F ypp_bands.in

and edit the ''ypp_bands.in'' file:

electrons # [R] Electrons (and holes)
bnds # [R] Bands
INTERP_mode= "BOLTZ" # Interpolation mode (NN=nearest point, BOLTZ=boltztrap aproach)
OutputAlat= 4.716000 # [a.u.] Lattice constant used for "alat" ouput format
cooIn= "rlu" # Points coordinates (in) cc/rlu/iku/alat
cooOut= "rlu"
% BANDS_bands
1 | 100 | # Number of bands
%
INTERP_Shell_Fac= 20.00000 # The bigger it is a higher number of shells is used
CIRCUIT_E_DB_path= "none" # SAVE obtained from the QE `bands` run (alternative to %BANDS_kpts)
BANDS_path= "" # BANDS path points labels (G,M,K,L...)
BANDS_steps= 10
#BANDS_built_in # Print the bands of the generating points of the circuit using the nearest internal point
%BANDS_kpts
%

We modify the following lines:
BANDS_steps=30
% BANDS_bands
6 | 11 | # Number of bands
%
%BANDS_kpts # K points of the bands circuit
0.33300 |-.66667 |0.00000 |
0.00000 |0.00000 |0.00000 |
0.50000 |-.50000 |0.00000 |
0.33300 |-.66667 |0.00000 |
0.33300 |-.66667 |0.50000 |
0.00000 |0.00000 |0.50000 |
0.50000 |-.50000 |0.50000 |
%

which means we assign 30 points in each segment, we ask to interpolate 3 occupied and 3 empty bands and we assign the following path passing from the high symmetry points: K Γ M K H A L.
Launching:
$ ypp -F ypp_bands.in
will produce the output file ''o.bands_interpolated'' containing:

#
# |k| b6 b7 b8 b9 b10 b11 kx ky kz
#
#
0.00000 -7.22092 -0.13402 -0.13395 4.67691 4.67694 10.08905 0.33300 -0.66667 0.00000
0.03725 -7.18857 -0.17190 -0.12684 4.66126 4.71050 10.12529 0.32190 -0.64445 0.00000
...

and we can plot the bands using gnuplot:
$ gnuplot
gnuplot> p "o.bands_interpolated" u 0:2 w l, "o.bands_interpolated" u 0:3 w l, ...

[[File:bands_lda.png|center|350px|caption]]

and you can recognize the index of the high symmetry point by inspecting the last three columns.
Note that up to now we have interpolated the LDA band structure. In order to plot the GW band structure, we need to tell <code>ypp</code> in the input file where the ''ndb.QP'' database is found. This is achieved by adding in the ''ypp_bands.in'' file the line:

GfnQPdb= "E < ./all_Bz/ndb.QP"

and relaunch

$ ypp -F ypp_bands.in

Now the file ''o.bands_interpolated_01'' contains the GW interpolated band structure. We can plot the LDA and GW band structure together by using the gnuplot script [[gnuplot_scripts|bands.gnu]].

<gallery mode=nolines widths=500px heights=500px perrow=2 caption="Band strcuture of bulk hBN">
File:hBN_bands.png| LDA and GW bands structure
File:hBN_bands_lit.png| LDA and GW bands structure from Ref. <ref name="Arnaud" />
</gallery>

*As expected the effect of the GW correction is to open the gap.
*Comparing the obtained band structure with the one found in the literature by Arnaud and coworkers <ref name="Arnaud"> B. Arnaud, S. Lebegue,P. Rabiller, and M. Alouani Phys, Rev. Lett. 96, 026402 (2006)</ref> we found a very nice qualitative agreement.
*Quantitatively we found a smaller gap: about 5.2 eV (indirect gap), 5.7 eV (direct gap) while in Ref.<ref name="Arnaud" /> is found 5.95 eV for the indirect gap and a minimum direct band gap of 6.47 eV. Other values are also reported in the literature depending on the used pseudopotentials, starting functional and type of self-consistency (see below).
*The present tutorial has been done with a small k point grid which is an important parameter to be checked, so convergence with respect the k point sampling has to be validated.

==Step 4: Summary of the convergence parameters==
We have calculated the band structure of hBN starting from a DFT calculation, here we summarize the main variable we have checked to achieve convergence:

* <code>[[Variables#EXXRLvcs|EXXRLvcs]]</code> # [XX] Exchange RL components
Number of G-vectors in the exchange. This number should be checked carefully. Generally a large number is needed as the QP energies show a slow convergence. The calcualtion of the exchange part is rather fast.

*<code>[[Variables#BndsRnXp|BndsRnXp]]</code> #[Xp] Polarization function bands
Number of bands in the independent response function form which the dielectric matrix is calculated. Also this paramater has to be checked carefully,together with NGsBlkXp as the two variables are interconnected

*<code>[[Variables#NGsBlkXp|NGsBlkXp]]</code> # [Xp] Response block size
Number of G-vectors block in the dielectric constant. Also this paramater has to be checked carefully, to be checked together with BndsRnXp. A large number of bands and block can make the calculation very demanding.

*<code>[[Variables#LongDrXp|LongDrXp]] </code> # [Xp] [cc] Electric Field
Direction of the electric field for the calculation of the q=0 component of the dielectric constant e(q,w). In a bulk can be set to (1,1,1), attention must be paid for non 3D systems.

*<code>[[Variables#PPAPntXp|PPAPntXp]] </code> # [Xp] Plasmon pole imaginary energy: this is the second frequency used to fit the Godby-Needs plasmon-pole model (PPM). If results depend consistently by changing this frequency, the PPM is not adequate for your calculation and it is need to gp beyond that, e.g. Real-axis.

*<code>[[Variables#GbndRnge|GbndRnge]] </code> # [GW] G[W] bands range
Number of bands used to expand the Green's function. This number is usually larger than the number of bands used to calculated the dielectricconstant. Single quasiparticle energies converge slowly with respect GbndRnge, energy difference behave better. You can use terminator technique to mitigate the slow dependence.

*<code>[[Variables#GDamping|GDamping]] </code> # [GW] G[W] damping
Small damping in the Green's function definition, the delta
parameter. The final result shouuld not depend on that, usually set at 0.1 eV

*<code>[[Variables#dScStep|dScStep]] </code> # [GW]
Energy step to evaluate Z factors

*<code>[[Variables#DysSolver|DysSolver]] </code> # [GW] Dyson Equation solver ("n","s","g")
Parameters related to the solution of the Dyson equation, "n" Newton linearization, 's' non linear secant method

*<code>[[Variables#GTermKind|GTermKind]] </code> [GW] GW terminator
Terminator for the self-energy<ref name="BG" /> . We have seen how this spped up the convergence with respect empty bands.

*<code>[[Variables#%QPkrange |QPkrange ]] </code> # [GW] QP generalized Kpoint/Band indices
K-points and band range where you want to calculate the GW correction. The syntax is
first kpoint | last kpoint | first band | last band

==References==

First steps: walk through from DFT(standalone)

2023-05-24T15:25:26Z

Giacomo.sesti: /* Runlevel selection */

In this tutorial you will learn how to calculate optical spectra using Yambo, starting from a DFT calculation and ending with a look at local field effects in the optical response.

== System characteristics ==
We will use a 3D system (bulk hBN) and a 2D system (hBN sheet).

[[File:HBN-bulk-3x3-annotated.png|x200px|Atomic structure of bulk hBN]]
[[File:HBN2.png|x200px|Atomic structure of 2D hBN]]

'''Hexagonal boron nitride - hBN''':
* HCP lattice, ABAB stacking
* Four atoms per cell, B and N (16 electrons)
* Lattice constants: ''a'' = 4.716 [a.u.], ''c/a'' = 2.582
* Plane wave cutoff 40 Ry (~1500 RL vectors in wavefunctions)
* SCF run: shifted ''6x6x2'' grid (12 k-points) with 8 bands
* Non-SCF run: gamma-centred ''6x6x2'' (14 k-points) grid with 100 bands

=== Prerequisites ===
'''You will need''':


* Before starting, get the hBN tutorial files [https://www.yambo-code.eu/wiki/index.php/Tutorials#Tutorial_files here]
* <code>yambo</code> executable
* <code>gnuplot</code> for plotting spectra



==Initialization of Yambo databases==


Every Yambo run '''must''' start with this step. Go to the folder ''containing'' the hBN-bulk <code>SAVE</code> directory:
$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ ls
SAVE

'''TIP''': do not run yambo from ''inside'' the <code>SAVE</code> folder!
'''This is the wrong way .. '''
$ cd SAVE
$ yambo
yambo: cannot access CORE database (SAVE/*db1 and/or SAVE/*wf)
In fact, if you ever see such message:
it usually means you are trying to launch Yambo '''from the wrong place'''.
$ cd ..

Now you are in the proper place and
$ ls
SAVE
you can simply launch the code
$ yambo
This will run the initialization (setup) ''runlevel''.

===Run-time output===
This is typically written to standard output (on screen) and tracks the progress of the run in real time:
<---> [01] MPI/OPENMP structure, Files & I/O Directories
<---> [02] CORE Variables Setup
<---> [02.01] Unit cells
<---> [02.02] Symmetries
<---> [02.03] Reciprocal space
<---> Shells finder |########################################| [100%] --(E) --(X)
<---> [02.04] K-grid lattice
<---> Grid dimensions : 6 6 2
<---> [02.05] Energies & Occupations
<---> [03] Transferred momenta grid and indexing
<---> BZ -> IBZ reduction |########################################| [100%] --(E) --(X)
<---> [03.01] X indexes
<---> X [eval] |########################################| [100%] --(E) --(X)
<---> X[REDUX] |########################################| [100%] --(E) --(X)
<---> [03.01.01] Sigma indexes
<---> Sigma [eval] |########################################| [100%] --(E) --(X)
<---> Sigma[REDUX] |########################################| [100%] --(E) --(X)
<---> [04] Timing Overview
<---> [05] Memory Overview
<---> [06] Game Over & Game summary
Specific runlevels are indicated with numeric labels like [02.02]. 
The hashes (#) indicate progress of the run in Wall Clock time, indicating the elapsed (E) and expected (X) time to complete a runlevel, and the percentage of the task complete.
In this case the simulation was so fast that there is not output. On longer simulations you will be able to appreciate this feature.

===New core databases===
New databases appear in the ''SAVE'' folder:
$ ls SAVE
ns.db1 ns.wf ns.kb_pp_pwscf '''ndb.gops ndb.kindx'''
ns.wf_fragments_1_1 ...
ns.kb_pp_pwscf_fragment_1 ...
These contain information about the ''G''-vector shells and ''k/q''-point meshes as defined by the DFT calculation.

In general: a database called ''n'''s'''.xxx'' is a ''static'' database, generated once by <code>p2y</code>, while databases called ''n'''db'''.xxx'' are ''dynamically'' generated while you use <code>yambo</code>.

'''TIP''': if you launch <code>yambo</code>, but it does not seem to do anything, check that these files are present.

===Report file===
A report file ''r_setup'' is generated in the run directory.
This mostly reports information about the ground state system as defined by the DFT run, but also adds information about the band gaps, occupations, shells of G-vectors, IBZ/BZ grids, the CPU structure (for parallel runs), and so on. Some points of note:

Up to Yambo version 4.5
[02.03] RL shells
=================
Shells, format: [S#] G_RL(mHa)
[S453]:8029(0.7982E+5) [S452]:8005(0.7982E+5) [S451]:7981(0.7982E+5) [S450]:7957(0.7942E+5)
...
[S4]:11( 1183.) [S3]:5( 532.5123) [S2]:3( 133.1281) [S1]:1( 0.000000)

From Yambo version 5.0
[02.03] Reciprocal space
========================

nG shells : 217
nG charge : 3187
nG WFs : 1477
nC WFs : 1016
G-vecs. in first 21 shells: [ Number ]
1 3 5 11 13 25 37 39 51
63 65 71 83 95 107 113 125 127
139 151 163
...
Shell energy in first 21 shells: [ mHa ]
0.00000 133.128 532.512 1183.37 1198.15 1316.50 1715.88 2130.05 2381.52
3313.42 3328.20 3550.11 3683.24 4082.62 4511.57 4733.48 4748.27 4792.61
4866.61 5266.00 5680.16
...

This reports the set of closed reciprocal lattice (RL) shells defined internally that contain G-vectors with the same modulus.
The highest number of RL vectors we can use is 8029. Yambo will always redefine any input variable in RL units to the nearest closed shell.

Up to Yambo version 4.5
[02.05] Energies [ev] & Occupations
===================================
Fermi Level [ev]: 5.112805
VBM / CBm [ev]: 0.000000 3.876293
Electronic Temp. [ev K]: 0.00 0.00
Bosonic Temp. [ev K]: 0.00 0.00
El. density [cm-3]: 0.460E+24
States summary : Full Metallic Empty
0001-0008 0009-0100
Indirect Gaps [ev]: 3.876293 7.278081
Direct Gaps [ev]: 4.28829 11.35409
X BZ K-points : 72

From Yambo version 5.0
[02.05] Energies & Occupations
==============================

[X] === General ===
[X] Electronic Temperature : 0.000000 0.000000 [eV K]
[X] Bosonic Temperature : 0.000000 0.000000 [eV K]
[X] Finite Temperature mode : no
[X] El. density : 0.46037E+24 [cm-3]
[X] Fermi Level : 5.110835 [eV]

[X] === Gaps and Widths ===
[X] Conduction Band Min : 3.877976 [eV]
[X] Valence Band Max : 0.000000 [eV]
[X] Filled Bands : 8
[X] Empty Bands : 9 100
[X] Direct Gap : 4.289853 [eV]
[X] Direct Gap localized at k-point : 7
[X] Indirect Gap : 3.877976 [eV]
[X] Indirect Gap between k-points : 14 7
[X] Last valence band width : 3.401086 [eV]
[X] 1st conduction band width : 4.266292 [eV]

Yambo recalculates again the Fermi level (close to the value of 5.06 noted in the PWscf SCF calculation). From here on, however, the Fermi level is set to zero, and other eigenvalues are shifted accordingly. The system is insulating (8 filled, 92 empty) with an indirect band gap of 3.87 eV. The direct and indirect gaps are indicated. There are 72 k-points in the full BZ, generated using symmetry from the 14 k-points in our user-defined grid.

'''TIP''': You should inspect the report file after ''every'' run for errors and warnings.

===Different ways of running yambo===
We just run Yambo interactively.

Let's try to re-run the setup with the command
$ nohup yambo &
$ ls
l_setup nohup.out r_setup r_setup_01 SAVE

If Yambo is launched using a script, or as a background process, or in parallel, this output will appear in a log file prefixed by the letter ''l'', in this case as ''l_setup''.
If this log file already exists from a previous run, it will not be overwritten. Instead, a new file will be created with an incrementing numerical label, e.g. ''l_setup_01, l_setup_02'', etc. '''This applies to all files created by Yambo'''. Here we see that l_setup was created for the first time, but r_setup already existed from the previous run, so now we have r_setup_01
If you check the differences between the two you will notice that in the second run yambo is reading the previously created ndb.kindx in place of re-computing the indexes.
Indeed the output inside l_setup does not show the timing for X and Sigma

As a last step we run the setup in parallel, but first we delete the ndb.kindx file
$ rm SAVE/ndb.kindx
$ mpirun -np 4 yambo
$ ls
LOG l_setup nohup.out r_setup r_setup_01 r_setup_02 SAVE
There is now r_setup_02
In the case of parallel runs, CPU-dependent log files will appear inside a ''LOG'' folder, e.g.
$ ls LOG
l_setup_CPU_1 l_setup_CPU_2 l_setup_CPU_3 l_setup_CPU_4
This behaviour can be controlled at runtime - see the Parallel tutorial for details.

===2D hBN===
Simply repeat the steps above. Go to the folder ''containing'' the hBN-sheet ''SAVE'' directory and launch <code>yambo</code>:
$ cd TUTORIALS/hBN-2D/YAMBO
$ ls
SAVE
$ yambo
Again, inspect the ''r_setup'' file, output logs, and verify that ''ndb.gops'' and ''ndb.kpts'' have been created inside the SAVE folder.

You are now ready to use Yambo!

==Yambo's command line interface==
Yambo uses a command line interface to select tasks, generate input files, and control the runtime behaviour.

In this module you will learn how to select tasks, generate and modify input files, and control the runtime behaviour by using Yambo's command line interface.


=== Input file generator ===
We are going to work again with bulk hBN.
First, move to the appropriate folder and initialize the Yambo databases if you haven't already done so.
$ cd YAMBO_TUTORIALS/hBN/YAMBO
$ yambo ''(initialize)''

Yambo generates its own input files: you just tell the code what you want to calculate by launching Yambo along with one or more options.
To see the list of possible options, run <code>yambo -h</code> (we report here only the part we are focusing in)
$ yambo -h
'A shiny pot of fun and happiness [C.D.Hogan]'

This is : yambo
Version : 5.0.1 Revision 19547 Hash e90d90f2d
Configuration: MPI+OpenMP+SLK+SLEPC+HDF5_MPI_IO

...

Initializations:
-setup (-i) :Initialization
-coulomb (-r) :Coulomb potential

Response Functions:
-optics (-o) <string> :Linear Response optical properties (more with -h optics)
-X (-d) <string> :Inverse Dielectric Matrix (more with -h X)
-dipoles (-q) :Oscillator strenghts (or dipoles)
-kernel (-k) <string> :Kernel (more with -h kernel)

Self-Energy:
-hf (-x) :Hartree-Fock
-gw0 (-p) <string> :GW approximation (more with -h gw0)
-dyson (-g) <string> :Dyson Equation solver (more with -h dyson)
-lifetimes (-l) :GoWo Quasiparticle lifetimes

Bethe-Salpeter Equation:
-Ksolver (-y) <string> :BSE solver (more with -h Ksolver)

Total Energy:
-acfdt :ACFDT Total Energy

Utilites:
...
-slktest :ScaLapacK test

The options can be split into two sets: 
* A set of options which is needed to generate the appropriate input file (default name: ''yambo.in'') selecting the kind of simulation you would like to perform 
* A set of options which can be used to manage auxiliary functions (like redirect the I/O, choose a specific name for the input file, etc ..).

===Runlevel selection===
First of all, you would like to specify which kind of simulation you are going to perform and generate an input file with the first set of options.
By default, when generating the input file, Yambo will launch the <code>vi</code> editor.
Editor choice can be changed when launching the configure before compilation; alternatively you can use the <code>-Q</code> run time option to skip the automatic editing (do this if you are not familiar with <code>vi</code>!):
$ yambo -hf -Q
yambo: input file yambo.in created
$ emacs yambo.in ''or your favourite editing tool''

Multiple options can be used together to activate various tasks or runlevels (in some cases this is actually a necessity).
For instance, to generate an input file for optical spectra including local field effects (Hartree approximation), do (and then exit)
$ yambo -optics c -kernel hartree ''which switches on:''
optics # [R] Linear Response optical properties
chi # [R][CHI] Dyson equation for Chi.
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
To perform a Hartree-Fock and GW calculation using a plasmon-pole approximation, do (and then exit):
$ yambo -hf -gw0 p -dyson n ''which switches on:''
HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
gw0 # [R GW] GoWo Quasiparticle energy levels
ppa # [R Xp] Plasmon Pole Approximation
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
Each runlevel activates its own list of variables and flags.

The previous command is also equivalent to
$ yambo -hf -gw0 r -dyson n -X p

===Changing input parameters ===
Yambo reads various parameters from existing database files and/or input files and uses them to suggest values or ranges.
Let's illustrate this by generating the input file for a Hartree-Fock calculation.

$ yambo -hf
Inside the generated input file you should find:
[[Variables#EXXRLvcs|EXXRLvcs]] = 3187 RL # [XX] Exchange RL components
%[[Variables#QPkrange|QPkrange]] # [GW] QP generalized Kpoint/Band indices
1| 14| 6|10|
%
The <code>[[Variables#QPkrange|QPkrange]]</code> variable (follow the link for a "detailed" explanation for any variable) suggests a range of k-points (1 to 14) and bands (1 to 100) based on what it finds in the core database ''SAVE/ns.db1'', i.e. as defined by the DFT code. 
Leave that variable alone, and instead modify the previous variable to <code>EXXRLvcs= 1000 RL</code>

Save the file, and now generate the input a second time with <code>yambo -x</code>. You will see:
[[Variables#EXXRLvcs|EXXRLvcs]]= 1009 RL
This indicates that Yambo has read the new input value (1000 G-vectors), checked the database of G-vector shells ''(SAVE/ndb.gops)'',
and changed the input value to one that fits a completely closed shell.

Last, note that Yambo variables can be expressed in different '''units'''. In this case, <code>RL</code> can be replaced by an energy unit like Ry, eV, Ha, etc. Energy units are generally better as they are independent of the cell size. Technical information is available on the [[Variables]] page.

The input file generator of Yambo is thus an ''intelligent'' parser, which interacts with the user and the existing databases. For this reason we recommend that you always use Yambo to generate the input files, rather than making them yourself.

===Extra options===
Extra options modify some of the code's default settings. They can be used when launching the code but also when generating input files.

Let's have a look again to the possible options (we report here only the part we are focusing in):
$ yambo -h
This is : yambo
Version : 5.0.1 Revision 19547 Hash e90d90f2d
Configuration: MPI+OpenMP+SLK+SLEPC+HDF5_MPI_IO

Help & version:
-help (-h) <string> :<string> can be an option (e.g. -h optics)
-version :Code version & libraries

Input file & Directories:
-Input (-F) <string> :Input file
-Verbosity (-V) <string> :Input file variables verbosity (more with -h Verbosity)
-Job (-J) <string> :Job string
-Idir (-I) <string> :Input directory
-Odir (-O) <string> :I/O directory
-Cdir (-C) <string> :Communication directory

Parallel Control:
-parenv (-E) <string> :Environment Parallel Variables file
-nompi :Switch off MPI support
-noopenmp :Switch off OPENMP support

...

Utilites:
-Quiet (-Q) :Quiet input file creation
-fatlog :Verbose (fatter) log(s)
-DBlist (-D) :Databases properties
...

Command line options are extremely important to master if you want to use yambo productively.
Often, the meaning is clear from the help menu:
$ yambo -F yambo.in_HF -hf ''Make a Hartree-Fock input file called yambo.in_HF''
$ yambo -D ''Summarize the content of the databases in the SAVE folder''
$ yambo -I ../ ''Run the code, using a SAVE folder in a directory one level up''
$ yambo -C MyTest ''Run the code, putting all report, log, plot files inside a folder MyTest''

Other options deserve a closer look.

===Verbosity===
Yambo uses ''many'' input variables, many of which can be left at their default values. To keep input files short and manageable, only a few variables appear by default in the inout file. More advanced variables can be switched on by using the <code>-V</code> verbosity option. These are grouped according to the type of variable. For instance, <code>-V RL</code> switches on variables related to G vector summations, and <code>-V io</code> switches on options related to I/O control. Try:

$ yambo -optics c -V RL ''switches on:''
FFTGvecs= 3951 RL # [FFT] Plane-waves

$ yambo -optics c -V io ''switches on:''
StdoHash= 40 # [IO] Live-timing Hashes
DBsIOoff= "none" # [IO] Space-separated list of DB with NO I/O. DB= ...
DBsFRAGpm= "none" # [IO] Space-separated list of +DB to be FRAG and ...
#WFbuffIO # [IO] Wave-functions buffered I/O

Unfortunately, -V options must be invoked and changed ''one at a time''. When you are more expert, you may go straight to <code>-V all</code>, which turns on all possible variables. However note that <code>yambo -o c -V all</code> adds an extra 30 variables to the input file, which can be confusing: use it with care.

===Job script label===
The best way to keep track of different runs using different parameters is through the <code>-J</code> flag. This inserts a label in all output and report files, and creates a new folder containing any new databases (i.e. they are not written in the core ''SAVE'' folder). Try:
$ yambo -V RL -hf -F yambo_hf.in ''and modify to''
FFTGvecs = 1 Ry
EXXRLvcs = 1 Ry
VXCRLvcs = 1 Ry
$ yambo -J 1Ry -F yambo_hf.in ''Run the code''
$ ls
yambo_hf.in SAVE
o-1Ry.hf r-1Ry_HF_and_locXC 1Ry 1Ry/ndb.HF_and_locXC
This is extremely useful when running convergence tests, trying out different parameters, etc.

''Exercise'': use <code>yambo</code> to report the properties of all database files (including ''ndb.HF_and_locXC'')

==Links==
* Back to [[ICTP 2022#Tutorials]]
* Back to [[CECAM VIRTUAL 2021#Tutorials]]

Quasi-particles of a 2D system

2023-05-23T15:47:14Z

Giacomo.sesti: /* RIM */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 4.109 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
RIM_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2023-05-23T15:27:29Z

Giacomo.sesti: /* RIM */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 2.598 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
RIM_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2023-05-23T15:24:03Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "none" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 2.598 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
RIM_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2023-05-23T15:23:35Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "none" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 2.598 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
RIM_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, create the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=m100_sys_test
#SBATCH --qos=qos_test
#SBATCH --reservation=s_tra_yambo
#SBATCH --time=0:30:00
#SBATCH --account=tra23_Yambo
#SBATCH --mem=230000MB
#SBATCH --job-name=rutile
#SBATCH --error=err.job-%j
#SBATCH --output=out.job-%j

# load yambo and dependencies
module purge
module load hpc-sdk/2022--binary
module load spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# info
ncpu=${SLURM_NTASKS}
nthreads=${OMP_NUM_THREADS}

label=MPI${ncpu}_OMP${nthreads}
jdir=run_${label}
cdir=run_${label}.out

# Update input file
filein0=gw.in # Original file
filein=gw_${label}.in # New file

cp -f $filein0 $filein
cat >> $filein << EOF

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

EOF

# run yambo
mpirun -n $ncpu yambo -F $filein -J ${jdir} -C $cdir
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2023-05-23T15:20:55Z

Giacomo.sesti: /* RIM */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "none" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 2.598 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
RIM_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, edit the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2023-05-23T10:36:39Z

Giacomo.sesti: /* Step 3: Running on GPU */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 20 | # [GW] G[W] bands range
%
GTermKind= "none" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 2.598 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
RIM_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, edit the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 2 MPI tasks, and using 1 thread.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2023-05-23T10:35:51Z

Giacomo.sesti: /* Step 3: Running on GPU */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 20 | # [GW] G[W] bands range
%
GTermKind= "none" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 2.598 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
RIM_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, edit the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have still the total number of tasks equal to 8. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 8 MPI tasks, and using 4 threads.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2023-05-23T10:30:35Z

Giacomo.sesti: /* Step 3: Running on GPU */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 20 | # [GW] G[W] bands range
%
GTermKind= "none" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 2.598 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
RIM_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, edit the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have <code>ntasks-per-node=2</code>. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, even though we are we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 8 MPI tasks, and using 4 threads.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely CPU calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2023-05-23T10:29:50Z

Giacomo.sesti: /* Step 2: GW parallel strategies */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 20 | # [GW] G[W] bands range
%
GTermKind= "none" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 2.598 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
RIM_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done when using Yambo on clusters. So, now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_2Dquasiparticle_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>. You should see we compute a much larger number of quasi-particles:

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

The [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, edit the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have <code>ntasks-per-node=2</code>. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, even though we are we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 8 MPI tasks, and using 4 threads.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely MPI calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Rome 2023

2023-05-23T10:24:00Z

Giacomo.sesti: /* DAY 2 - Tuesday, 23 May */

A general description of the goal(s) of the school can be found on the [https://www.yambo-code.eu/2023/02/18/yambo-school-2023/ Yambo main website]

== Use CINECA computational resources ==
Yambo tutorials will be run on the MARCONI100 (M100) accelerated cluster. You can find info about M100 [https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.2%3A+MARCONI100+UserGuide#UG3.2:MARCONI100UserGuide here].
In order to access computational resources provided by CINECA you need your personal username and password that were sent you by the organizers.

=== Connect to the cluster using ssh ===

You can access M100 via <code>ssh</code> protocol in different ways.

''' - Connect using username and password '''

Use the following command replacing your username:
$ ssh username@login.m100.cineca.it

However, in this way you have to type your password each time you want to connect.

''' - Connect using ssh key '''

You can setup a ssh key pair to avoid typing the password each time you want to connect to M100. To do so, go to your <code>.ssh</code> directory (usually located in the <code>home</code> directory):
$ cd $HOME/.ssh

If you don't have this directory, you can create it with <code>mkdir $HOME/.ssh</code>.

Once you are in the <code>.ssh</code> directory, run the <code>ssh-keygen</code> command to generate a private/public key pair:
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key: m100_id_rsa
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in <your_.ssh_dir>/m100_id_rsa
Your public key has been saved in <your_.ssh_dir>/m100_id_rsa.pub
The key fingerprint is:
<...>
The key's randomart image is:
<...>

Now you need to copy the '''public''' key to M100. You can do that with the following command (for this step you need to type your password):
$ ssh-copy-id -i <your_.ssh_dir>/m100_id_rsa.pub <username>@login.m100.cineca.it

Once the public key has been copied, you can connect to M100 without having to type the password using the <code>-i</code> option:
$ ssh -i <your_.ssh_dir>/m100_id_rsa username@login.m100.cineca.it

To simplify even more, you can paste the following lines in a file named <code>config</code> located inside the <code>.ssh</code> directory adjusting username and path:
Host m100
HostName login.m100.cineca.it
User username
IdentityFile <your_.ssh_dir>/m100_id_rsa

With the <code>config</code> file setup you can connect simply with
$ ssh m100

=== General instructions to run tutorials ===

Before proceeding, it is useful to know the different workspaces you have available on M100, which can be accessed using environment variables. The main ones are:
* <code>$HOME</code>: it's the <code>home</code> directory associated to your username;
* <code>$WORK</code>: it's the <code>work</code> directory associated to the account where the computational resources dedicated to this school are allocated;
* <code>$CINECA_SCRATCH</code>: it's the <code>scratch</code> directory associated to your username.
You can find more details about storage and FileSystems [https://wiki.u-gov.it/confluence/display/SCAIUS/UG2.5%3A+Data+storage+and+FileSystems here].

Please don't forget to '''run all tutorials in your scratch directory''':
$ echo $CINECA_SCRATCH
/m100_scratch/userexternal/username
$ cd $CINECA_SCRATCH

Computational resources on M100 are managed by the job scheduling system [https://slurm.schedmd.com/overview.html Slurm]. Most part of Yambo tutorials during this school can be run in serial, except some that need to be executed on multiple processors. Generally, Slurm batch jobs are submitted using a script, but the tutorials here are better understood if run interactively. The two procedures that we will use to submit interactive and non interactive jobs are explained below.

''' - Run a job using a batch script '''


This procedure is suggested for the tutorials and examples that need to be run in parallel. In these cases you need to submit the job using a batch script <code>job.sh</code>. Please note that the instructions in the batch script must be compatible with the specific M100 [https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.2%3A+MARCONI100+UserGuide#UG3.2:MARCONI100UserGuide-SystemArchitecture architecture] and [https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.2%3A+MARCONI100+UserGuide#UG3.2:MARCONI100UserGuide-Accounting accounting] systems. The complete list of Slurm options can be found [https://slurm.schedmd.com/sbatch.html here]. However you will find '''ready-to-use''' batch scripts in locations specified during the tutorials.

To submit the job, use the <code>sbatch</code> command:
$ sbatch job.sh
Submitted batch job <JOBID>

To check the job status, use the <code>squeue</code> command:
$ squeue -u <username>
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
<...> m100_... JOB username R 0:01 <N> <...>

If you need to cancel your job, do:
$ scancel <JOBID>

''' - Open an interactive session '''

This procedure is suggested for most of the tutorials, since the majority of these is meant to be run in serial (relatively to MPI parallelization) from the command line. Use the command below to open an interactive session of 1 hour (complete documentation [https://slurm.schedmd.com/salloc.html here]):
$ salloc -A tra23_Yambo -p m100_sys_test -q qos_test --reservation=s_tra_yambo --nodes=1 --ntasks-per-node=1 --cpus-per-task=4 -t 01:00:00
salloc: Granted job allocation 10164647
salloc: Waiting for resource configuration
salloc: Nodes r256n01 are ready for job

We ask for 4 cpus-per-task because we can exploit OpenMP parallelization with the available resources.

With <code>squeue</code> you can see that there is now a job running:
$ squeue -u username
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
10164647 m100_usr_ interact username R 0:02 1 r256n01

To run the tutorial, <code>ssh</code> into the node specified by the job allocation and <code>cd</code> to your scratch directory:
username@'''login02'''$ ssh r256n01
...
username@'''r256n01'''$ cd $CINECA_SCRATCH

Then, you need to manually load <code>yambo</code> as in the batch script above. Please note that the serial version of the code is in a different directory and does not need <code>spectrum_mpi</code>:
$ module purge
$ module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
$ export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

Finally, set the <code>OMP_NUM_THREADS</code> environment variable to 4 (as in the <code>--cpus-per-task</code> option):
$ export OMP_NUM_THREADS=4

To close the interactive session when you have finished, log out of the compute node with the <code>exit</code> command, and then cancel the job:
$ exit
$ scancel <JOBID>

''' - Plot results with gnuplot '''

During the tutorials you will often need to plot the results of the calculations. In order to do so on M100, '''open a new terminal window''' and connect to M100 enabling X11 forwarding with the <code>-X</code> option:
$ ssh -X m100

Please note that <code>gnuplot</code> can be used in this way only from the login nodes:
username@'''login01'''$ cd <directory_with_data>
username@'''login01'''$ gnuplot
...
Terminal type is now '...'
gnuplot> plot <...>

''' - Set up yambopy '''

In order to run yambopy on m100, you must first setup the conda environment (to be done only once):
$ cd
$ module load anaconda/2020.11
$ conda init bash
$ source .bashrc

After this, every time you want to use yambopy you need to load its module and environment:
$ module load anaconda/2020.11
$ conda activate /m100_work/tra23_Yambo/softwares/YAMBO/env_yambopy

== Tutorials ==

=== DAY 1 - Monday, 22 May ===

'''16:15 - 18:30 From the DFT ground state to the complete setup of a Many Body calculation using Yambo'''

To get the tutorial files needed for the following tutorials, follow these steps:
$ ssh m100
$ cd $CINECA_SCRATCH
$ mkdir YAMBO_TUTORIALS
$ cd YAMBO_TUTORIALS
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz

$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-2D.tar.gz

$ ls
hBN-2D.tar.gz hBN.tar.gz
$ tar -xvf hBN-2D.tar.gz
$ tar -xvf hBN.tar.gz
$ ls
'''hBN-2D''' '''hBN''' hBN-2D.tar.gz hBN.tar.gz

Now that you have all the files, you may open the interactive job session with <code>salloc</code> as explained above and proceed with the tutorials.

* [[First steps: walk through from DFT(standalone)|First steps: Initialization and more ]]
* [[Next steps: RPA calculations (standalone)|Next steps: RPA calculations ]]

At this point, you may learn about the python pre-postprocessing capabilities offered by yambopy, our python interface to yambo and QE. First of all, let's create a dedicated directory, download and extract the related files.

$ cd $CINECA_SCRATCH
$ mkdir YAMBOPY_TUTORIALS
$ cd YAMBOPY_TUTORIALS
$ wget https://media.yambo-code.eu/educational/tutorials/files/databases_yambopy.tar
$ tar -xvf databases_yambopy.tar
$ cd databases_yambopy

Then, follow '''the first three sections''' of this link, which are related to initialization and linear response.
* [[Yambopy tutorial: Yambo databases|Reading databases with yambopy]]

=== DAY 2 - Tuesday, 23 May ===

'''14:00 - 16:30 A tour through GW simulation in a complex material (from the blackboard to numerical computation: convergence, algorithms, parallel usage)'''

To get all the tutorial files needed for the following tutorials, follow these steps:

ssh m100
cd $CINECA_SCRATCH
cd YAMBO_TUTORIALS
wget https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz
wget https://media.yambo-code.eu/educational/tutorials/files/MoS2_2Dquasiparticle_tutorial.tar.gz
tar -xvf hBN.tar.gz
tar -xvf MoS2_2Dquasiparticle_tutorial.tar.gz
cd hBN

Now you can start the first tutorial:

* [[GW tutorial Rome 2023 | GW computations on practice: how to obtain the quasi-particle band structure of a bulk material ]]

If you have gone through the first tutorial, pass now to the second one:

cd $CINECA_SCRATCH
cd YAMBO_TUTORIALS
cd MoS2_HPC_tutorial

* [[Quasi-particles of a 2D system | Quasi-particles of a 2D system ]]

To conclude, you can learn an other method to plot the band structure in Yambo

* [[Yambopy tutorial: band structures| Yambopy tutorial: band structures]]

=== DAY 3 - Wednesday, 24 May ===
'''14:00 - 16:30 Bethe-Salpeter equation (BSE)''' Fulvio Paleari (CNR-Nano, Italy), Davide Sangalli (CNR-ISM, Italy)

To get the tutorial files needed for the following tutorials, follow these steps:
$ ssh m100
$ cd $CINECA_SCRATCH
$ cd YAMBO_TUTORIALS
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz # NOTE: YOU SHOULD ALREADY HAVE THIS FROM DAY 1

$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-convergence-kpoints.tar.gz

$ tar -xvf hBN-convergence-kpoints.tar.gz
$ tar -xvf hBN.tar.gz

Now, you may open the interactive job session with <code>salloc</code> and proceed with the following tutorials.

* [[Calculating optical spectra including excitonic effects: a step-by-step guide|Perform a BSE calculation from beginning to end ]]
* [[How to analyse excitons - ICTP 2022 school|Analyse your results (exciton wavefunctions in real and reciprocal space, etc.) ]]
* [[BSE solvers overview|Solve the BSE eigenvalue problem with different numerical methods]]
* [[How to choose the input parameters|Choose the input parameters for a meaningful converged calculation]]
Now, go into the yambopy tutorial directory to learn about python analysis tools for the BSE:
$ cd $CINECA_SCRATCH
$ cd YAMBOPY_TUTORIALS/databases_yambopy

* [[Yambopy_tutorial:_Yambo_databases#Exciton_intro_1:_read_and_sort_data|Visualization of excitonic properties with yambopy]]

'''17:00 - 18:30 Bethe-Salpeter equation in real time (TD-HSEX)''' Fulvio Paleari (CNR-Nano, Italy), Davide Sangalli (CNR-ISM, Italy)

The files needed for the following tutorials can be downloaded following these steps:
$ ssh m100
$ cd $CINECA_SCRATCH
$ cd YAMBO_TUTORIALS
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-2D-RT.tar.gz

$ tar -xvf hBN-2D-RT.tar.gz

* [[Introduction_to_Real_Time_propagation_in_Yambo#Time_Dependent_Equation_for_the_Reduced_One--Body_Density--Matrix|Read the introductive section to real-time propagation for the one-body density matrix]] (the part about time-dependent Schrödinger equation will be covered on DAY 4 and you can skip it for now)
* [[Prerequisites for Real Time propagation with Yambo|Perform the setup for a real-time calculation]]
* [[Linear response from real time simulations (density matrix only)|Calculate the linear response in real time]]
* [[Real time Bethe-Salpeter Equation (density matrix only)|Calculate the BSE in real time]]

=== DAY 4 - Thursday, May 25 ===

'''14:00 - 16:30 Real-time approach with the time dependent berry phase''' Myrta Gruning (Queen's University Belfast), Davide Sangalli (CNR-ISM, Italy)

* [[Linear response from Bloch-states dynamics]] (in preparation)
* [[Second-harmonic generation of 2D-hBN]] (in preparation)

* [[Real time approach to non-linear response]] (additional tutorial)
* [[Correlation effects in the non-linear response]] (additional tutorial)

=== DAY 5 - Friday, 26 May ===

== Lectures ==

=== DAY 1 - Monday, 22 May ===

* D. Varsano, [https://media.yambo-code.eu/educational/Schools/ROME2023/scuola_intro.pdf Description and goal of the school].
* G. Stefanucci, [https://media.yambo-code.eu/educational/Schools/ROME2023/Stefanucci.pdf The Many-Body Problem: Key concepts of the Many-Body Perturbation Theory]
* M. Marsili, [https://media.yambo-code.eu/educational/Schools/ROME2023/marghe_linear_response.pdf Beyond the independent particle scheme: The linear response theory]

=== DAY 2 - Tuesday, 23 May ===

* E. Perfetto, [https://media.yambo-code.eu/educational/Schools/ROME2023/Talk_Perfetto.pdf An overview on non-equilibrium Green Functions]
* R. Frisenda, ARPES spectroscopy, an experimental overview
* A. Marini, The Quasi Particle concept and the GW method
* A. Guandalini, The GW method: approximations and algorithms
* D.A. Leon, C. Cardoso, Frequency dependence in GW: origin, modelling and practical implementations

=== DAY 3 - Wednesday, 24 May ===

* A. Molina-Sánchez, Modelling excitons: from 2D materials to Pump and Probe experiments
* M. Palummo, The Bethe-Salpeter equation: derivations and main physical concepts
* F. Paleari, Real time approach to the Bethe-Salpeter equation
* D. Sangalli, TD-HSEX and real-time dynamics

=== DAY 4 - Thursday, 25 May ===

* S. Mor, Time resolved spectroscopy: an experimental overview
* M. Grüning, Nonlinear optics within Many-Body Perturbation Theory
* N. Tancogne-Dejean, Theory and simulation of High Harmonics Generation
* Y. Pavlyukh, Coherent electron-phonon dynamics within a time-linear GKBA scheme

Rome 2023

2023-05-23T10:23:24Z

Giacomo.sesti: /* DAY 2 - Tuesday, 23 May */

A general description of the goal(s) of the school can be found on the [https://www.yambo-code.eu/2023/02/18/yambo-school-2023/ Yambo main website]

== Use CINECA computational resources ==
Yambo tutorials will be run on the MARCONI100 (M100) accelerated cluster. You can find info about M100 [https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.2%3A+MARCONI100+UserGuide#UG3.2:MARCONI100UserGuide here].
In order to access computational resources provided by CINECA you need your personal username and password that were sent you by the organizers.

=== Connect to the cluster using ssh ===

You can access M100 via <code>ssh</code> protocol in different ways.

''' - Connect using username and password '''

Use the following command replacing your username:
$ ssh username@login.m100.cineca.it

However, in this way you have to type your password each time you want to connect.

''' - Connect using ssh key '''

You can setup a ssh key pair to avoid typing the password each time you want to connect to M100. To do so, go to your <code>.ssh</code> directory (usually located in the <code>home</code> directory):
$ cd $HOME/.ssh

If you don't have this directory, you can create it with <code>mkdir $HOME/.ssh</code>.

Once you are in the <code>.ssh</code> directory, run the <code>ssh-keygen</code> command to generate a private/public key pair:
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key: m100_id_rsa
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in <your_.ssh_dir>/m100_id_rsa
Your public key has been saved in <your_.ssh_dir>/m100_id_rsa.pub
The key fingerprint is:
<...>
The key's randomart image is:
<...>

Now you need to copy the '''public''' key to M100. You can do that with the following command (for this step you need to type your password):
$ ssh-copy-id -i <your_.ssh_dir>/m100_id_rsa.pub <username>@login.m100.cineca.it

Once the public key has been copied, you can connect to M100 without having to type the password using the <code>-i</code> option:
$ ssh -i <your_.ssh_dir>/m100_id_rsa username@login.m100.cineca.it

To simplify even more, you can paste the following lines in a file named <code>config</code> located inside the <code>.ssh</code> directory adjusting username and path:
Host m100
HostName login.m100.cineca.it
User username
IdentityFile <your_.ssh_dir>/m100_id_rsa

With the <code>config</code> file setup you can connect simply with
$ ssh m100

=== General instructions to run tutorials ===

Before proceeding, it is useful to know the different workspaces you have available on M100, which can be accessed using environment variables. The main ones are:
* <code>$HOME</code>: it's the <code>home</code> directory associated to your username;
* <code>$WORK</code>: it's the <code>work</code> directory associated to the account where the computational resources dedicated to this school are allocated;
* <code>$CINECA_SCRATCH</code>: it's the <code>scratch</code> directory associated to your username.
You can find more details about storage and FileSystems [https://wiki.u-gov.it/confluence/display/SCAIUS/UG2.5%3A+Data+storage+and+FileSystems here].

Please don't forget to '''run all tutorials in your scratch directory''':
$ echo $CINECA_SCRATCH
/m100_scratch/userexternal/username
$ cd $CINECA_SCRATCH

Computational resources on M100 are managed by the job scheduling system [https://slurm.schedmd.com/overview.html Slurm]. Most part of Yambo tutorials during this school can be run in serial, except some that need to be executed on multiple processors. Generally, Slurm batch jobs are submitted using a script, but the tutorials here are better understood if run interactively. The two procedures that we will use to submit interactive and non interactive jobs are explained below.

''' - Run a job using a batch script '''


This procedure is suggested for the tutorials and examples that need to be run in parallel. In these cases you need to submit the job using a batch script <code>job.sh</code>. Please note that the instructions in the batch script must be compatible with the specific M100 [https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.2%3A+MARCONI100+UserGuide#UG3.2:MARCONI100UserGuide-SystemArchitecture architecture] and [https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.2%3A+MARCONI100+UserGuide#UG3.2:MARCONI100UserGuide-Accounting accounting] systems. The complete list of Slurm options can be found [https://slurm.schedmd.com/sbatch.html here]. However you will find '''ready-to-use''' batch scripts in locations specified during the tutorials.

To submit the job, use the <code>sbatch</code> command:
$ sbatch job.sh
Submitted batch job <JOBID>

To check the job status, use the <code>squeue</code> command:
$ squeue -u <username>
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
<...> m100_... JOB username R 0:01 <N> <...>

If you need to cancel your job, do:
$ scancel <JOBID>

''' - Open an interactive session '''

This procedure is suggested for most of the tutorials, since the majority of these is meant to be run in serial (relatively to MPI parallelization) from the command line. Use the command below to open an interactive session of 1 hour (complete documentation [https://slurm.schedmd.com/salloc.html here]):
$ salloc -A tra23_Yambo -p m100_sys_test -q qos_test --reservation=s_tra_yambo --nodes=1 --ntasks-per-node=1 --cpus-per-task=4 -t 01:00:00
salloc: Granted job allocation 10164647
salloc: Waiting for resource configuration
salloc: Nodes r256n01 are ready for job

We ask for 4 cpus-per-task because we can exploit OpenMP parallelization with the available resources.

With <code>squeue</code> you can see that there is now a job running:
$ squeue -u username
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
10164647 m100_usr_ interact username R 0:02 1 r256n01

To run the tutorial, <code>ssh</code> into the node specified by the job allocation and <code>cd</code> to your scratch directory:
username@'''login02'''$ ssh r256n01
...
username@'''r256n01'''$ cd $CINECA_SCRATCH

Then, you need to manually load <code>yambo</code> as in the batch script above. Please note that the serial version of the code is in a different directory and does not need <code>spectrum_mpi</code>:
$ module purge
$ module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
$ export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-cpu/bin:$PATH

Finally, set the <code>OMP_NUM_THREADS</code> environment variable to 4 (as in the <code>--cpus-per-task</code> option):
$ export OMP_NUM_THREADS=4

To close the interactive session when you have finished, log out of the compute node with the <code>exit</code> command, and then cancel the job:
$ exit
$ scancel <JOBID>

''' - Plot results with gnuplot '''

During the tutorials you will often need to plot the results of the calculations. In order to do so on M100, '''open a new terminal window''' and connect to M100 enabling X11 forwarding with the <code>-X</code> option:
$ ssh -X m100

Please note that <code>gnuplot</code> can be used in this way only from the login nodes:
username@'''login01'''$ cd <directory_with_data>
username@'''login01'''$ gnuplot
...
Terminal type is now '...'
gnuplot> plot <...>

''' - Set up yambopy '''

In order to run yambopy on m100, you must first setup the conda environment (to be done only once):
$ cd
$ module load anaconda/2020.11
$ conda init bash
$ source .bashrc

After this, every time you want to use yambopy you need to load its module and environment:
$ module load anaconda/2020.11
$ conda activate /m100_work/tra23_Yambo/softwares/YAMBO/env_yambopy

== Tutorials ==

=== DAY 1 - Monday, 22 May ===

'''16:15 - 18:30 From the DFT ground state to the complete setup of a Many Body calculation using Yambo'''

To get the tutorial files needed for the following tutorials, follow these steps:
$ ssh m100
$ cd $CINECA_SCRATCH
$ mkdir YAMBO_TUTORIALS
$ cd YAMBO_TUTORIALS
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz

$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-2D.tar.gz

$ ls
hBN-2D.tar.gz hBN.tar.gz
$ tar -xvf hBN-2D.tar.gz
$ tar -xvf hBN.tar.gz
$ ls
'''hBN-2D''' '''hBN''' hBN-2D.tar.gz hBN.tar.gz

Now that you have all the files, you may open the interactive job session with <code>salloc</code> as explained above and proceed with the tutorials.

* [[First steps: walk through from DFT(standalone)|First steps: Initialization and more ]]
* [[Next steps: RPA calculations (standalone)|Next steps: RPA calculations ]]

At this point, you may learn about the python pre-postprocessing capabilities offered by yambopy, our python interface to yambo and QE. First of all, let's create a dedicated directory, download and extract the related files.

$ cd $CINECA_SCRATCH
$ mkdir YAMBOPY_TUTORIALS
$ cd YAMBOPY_TUTORIALS
$ wget https://media.yambo-code.eu/educational/tutorials/files/databases_yambopy.tar
$ tar -xvf databases_yambopy.tar
$ cd databases_yambopy

Then, follow '''the first three sections''' of this link, which are related to initialization and linear response.
* [[Yambopy tutorial: Yambo databases|Reading databases with yambopy]]

=== DAY 2 - Tuesday, 23 May ===

'''14:00 - 16:30 A tour through GW simulation in a complex material (from the blackboard to numerical computation: convergence, algorithms, parallel usage)'''

To get all the tutorial files needed for the following tutorials, follow these steps:

ssh m100
cd $CINECA_SCRATCH
cd YAMBO_TUTORIALS
wget https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz
wget https://media.yambo-code.eu/educational/tutorials/files/MoS2_2Dquasiparticle_tutorial.tar.gz
tar -xvf hBN.tar.gz
tar -xvf MoS2_HPC_tutorial.tar.gz
cd hBN

Now you can start the first tutorial:

* [[GW tutorial Rome 2023 | GW computations on practice: how to obtain the quasi-particle band structure of a bulk material ]]

If you have gone through the first tutorial, pass now to the second one:

cd $CINECA_SCRATCH
cd YAMBO_TUTORIALS
cd MoS2_HPC_tutorial

* [[Quasi-particles of a 2D system | Quasi-particles of a 2D system ]]

To conclude, you can learn an other method to plot the band structure in Yambo

* [[Yambopy tutorial: band structures| Yambopy tutorial: band structures]]

=== DAY 3 - Wednesday, 24 May ===
'''14:00 - 16:30 Bethe-Salpeter equation (BSE)''' Fulvio Paleari (CNR-Nano, Italy), Davide Sangalli (CNR-ISM, Italy)

To get the tutorial files needed for the following tutorials, follow these steps:
$ ssh m100
$ cd $CINECA_SCRATCH
$ cd YAMBO_TUTORIALS
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz # NOTE: YOU SHOULD ALREADY HAVE THIS FROM DAY 1

$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-convergence-kpoints.tar.gz

$ tar -xvf hBN-convergence-kpoints.tar.gz
$ tar -xvf hBN.tar.gz

Now, you may open the interactive job session with <code>salloc</code> and proceed with the following tutorials.

* [[Calculating optical spectra including excitonic effects: a step-by-step guide|Perform a BSE calculation from beginning to end ]]
* [[How to analyse excitons - ICTP 2022 school|Analyse your results (exciton wavefunctions in real and reciprocal space, etc.) ]]
* [[BSE solvers overview|Solve the BSE eigenvalue problem with different numerical methods]]
* [[How to choose the input parameters|Choose the input parameters for a meaningful converged calculation]]
Now, go into the yambopy tutorial directory to learn about python analysis tools for the BSE:
$ cd $CINECA_SCRATCH
$ cd YAMBOPY_TUTORIALS/databases_yambopy

* [[Yambopy_tutorial:_Yambo_databases#Exciton_intro_1:_read_and_sort_data|Visualization of excitonic properties with yambopy]]

'''17:00 - 18:30 Bethe-Salpeter equation in real time (TD-HSEX)''' Fulvio Paleari (CNR-Nano, Italy), Davide Sangalli (CNR-ISM, Italy)

The files needed for the following tutorials can be downloaded following these steps:
$ ssh m100
$ cd $CINECA_SCRATCH
$ cd YAMBO_TUTORIALS
$ wget https://media.yambo-code.eu/educational/tutorials/files/hBN-2D-RT.tar.gz

$ tar -xvf hBN-2D-RT.tar.gz

* [[Introduction_to_Real_Time_propagation_in_Yambo#Time_Dependent_Equation_for_the_Reduced_One--Body_Density--Matrix|Read the introductive section to real-time propagation for the one-body density matrix]] (the part about time-dependent Schrödinger equation will be covered on DAY 4 and you can skip it for now)
* [[Prerequisites for Real Time propagation with Yambo|Perform the setup for a real-time calculation]]
* [[Linear response from real time simulations (density matrix only)|Calculate the linear response in real time]]
* [[Real time Bethe-Salpeter Equation (density matrix only)|Calculate the BSE in real time]]

=== DAY 4 - Thursday, May 25 ===

'''14:00 - 16:30 Real-time approach with the time dependent berry phase''' Myrta Gruning (Queen's University Belfast), Davide Sangalli (CNR-ISM, Italy)

* [[Linear response from Bloch-states dynamics]] (in preparation)
* [[Second-harmonic generation of 2D-hBN]] (in preparation)

* [[Real time approach to non-linear response]] (additional tutorial)
* [[Correlation effects in the non-linear response]] (additional tutorial)

=== DAY 5 - Friday, 26 May ===

== Lectures ==

=== DAY 1 - Monday, 22 May ===

* D. Varsano, [https://media.yambo-code.eu/educational/Schools/ROME2023/scuola_intro.pdf Description and goal of the school].
* G. Stefanucci, [https://media.yambo-code.eu/educational/Schools/ROME2023/Stefanucci.pdf The Many-Body Problem: Key concepts of the Many-Body Perturbation Theory]
* M. Marsili, [https://media.yambo-code.eu/educational/Schools/ROME2023/marghe_linear_response.pdf Beyond the independent particle scheme: The linear response theory]

=== DAY 2 - Tuesday, 23 May ===

* E. Perfetto, [https://media.yambo-code.eu/educational/Schools/ROME2023/Talk_Perfetto.pdf An overview on non-equilibrium Green Functions]
* R. Frisenda, ARPES spectroscopy, an experimental overview
* A. Marini, The Quasi Particle concept and the GW method
* A. Guandalini, The GW method: approximations and algorithms
* D.A. Leon, C. Cardoso, Frequency dependence in GW: origin, modelling and practical implementations

=== DAY 3 - Wednesday, 24 May ===

* A. Molina-Sánchez, Modelling excitons: from 2D materials to Pump and Probe experiments
* M. Palummo, The Bethe-Salpeter equation: derivations and main physical concepts
* F. Paleari, Real time approach to the Bethe-Salpeter equation
* D. Sangalli, TD-HSEX and real-time dynamics

=== DAY 4 - Thursday, 25 May ===

* S. Mor, Time resolved spectroscopy: an experimental overview
* M. Grüning, Nonlinear optics within Many-Body Perturbation Theory
* N. Tancogne-Dejean, Theory and simulation of High Harmonics Generation
* Y. Pavlyukh, Coherent electron-phonon dynamics within a time-linear GKBA scheme

Tutorials

2023-05-23T10:22:46Z

Giacomo.sesti: /* Files needed for stand-alone tutorials */

If you are starting out with Yambo, or even an experienced user, we recommend that you complete the following tutorials before trying to use Yambo for your system.

The tutorials are meant to give some introductory background to the key concepts behind Yambo. Practical topics such as convergence are also discussed.
Nonetheless, users are invited to first read and study the [[lectures|background material]] in order to get familiar with the fundamental physical quantities.

Two kinds of tutorials are provided: '''stand-alone''' and '''modular'''. In addition you must have a working environment where both Yambo (and eventually QE) are installed.

== Setting up Yambo (and eventually QE) ==
To be able to follow the school you need a running version of the yambo/QE code.

There are several different ways to prepare a working environment.

=== Virtual Machine(s) ===
The easiest way is to access to a virtual machine which contains both (i) yambo/QE and (ii) the tutorials.

You can do it in one of two ways:
* Virtual machine via [[ICTP cloud|ICTP cloud]] If the schools you are attending provided an ICTP virtual machine this is the preferred option. It works through internet connection inside a browser.
* Install the [[Yambo_Virtual_Machine|yambo virtual machine]] on your laptop / desktop. This requires Oracle virtual box. Pre-download of the Virtual machine. No internet connection needed.


=== User installation ===
You can also setup the yambo code on your on laptop / desktop using different methods.

As far as the Yambo source is concerned you can:
* Install [[Yambo via Docker|Yambo via Docker]]
* [[download|Download]] and [[Installation|install]] yambo on your laptop / desktop (requires a linux machine).
* Install yambo on your laptop/desktop/cluster [https://github.com/nicspalla/my-repo via Spack].
* Install using Anaconda.

=== Yambo User Installation with Anaconda ===
It is possible to install Yambo (up to v5.0.4) and Quantum-ESPRESSO via conda-forge (a conda channel/repository):
To setup Anaconda, please start from installing [https://www.anaconda.com/products/distribution#Downloads Anaconda] or [https://docs.conda.io/en/latest/miniconda.html Miniconda].

Then we suggest to create a conda environment and activate it:
conda create --name yambopy -c conda-forge
conda activate yambopy
Then you can install the prerequisites and the two codes:
conda install numpy scipy netcdf4 matplotlib pyyaml lxml pandas
conda install yambo
conda install qe

==Setting up Yambopy==

===Quick installation===

A quick way to start using Yambopy is described here.

* Make sure that you are using Python 3 and that you have the following python packages: <code>numpy</code>, <code>scipy</code>, <code>matplotlib</code>, <code>netCDF4</code>, <code>lxml</code>, <code>pyyaml</code>. Optionally, you may want to have abipy [[https://abinit.github.io/abipy/index.html]] installed for band structure interpolations.

* Go to a directory of your choice and clone yambopy from the git repository

git clone https://github.com/yambo-code/yambopy.git

If you don't want to use git, you may download a tarball from the git repository instead.

* Enter into the yambopy folder and install

cd yambopy
sudo python setup.py install

If you don't have administrative privileges (for example on a computing cluster), type instead

cd yambopy
python setup.py install --user

===Installing dependencies with Anaconda===
We suggest installing yambopy using Anaconda [[https://www.anaconda.com/products/distribution]] to manage the various python packages.

In this case, you can follow these steps.

First, install the required dependencies:
conda install numpy scipy netcdf4 lxml pyyaml

Then we create a conda environment based on python 3.6 (this is to ensure compatibility with abipy if we want to install it later on):
conda create --name NAME_ENV python=3.6
Here choose <code>NAME_ENV</code> as you want, e.g. <code>yenv</code>.

Now, we install abipy and its dependency pymatgen using <code>pip</code>. Here make sure that you are using the <code>pip</code> version provided by Anaconda and not your system version.

pip install pymatgen
pip install abipy

Finally, we are ready to install yambopy:

git clone https://github.com/yambo-code/yambopy.git

(or download and extract tarball) and follow the steps outlined in the quick installation section.

Now enter into the yambopy folder and install

cd yambopy
sudo python setup.py install

If you don't have administrative privileges (for example on a computing cluster), type instead

cd yambopy
python setup.py install --user

===Frequent issues===
When running the installation you may get a <code>SyntaxError</code> related to utf-8 encoding or it may complain that module <code>setuptools</code> is not installed even though it is. In this case, it means that the <code>sudo</code> command is not preserving the correct <code>PATH</code> for your python executable.

Solve the problem by running the installation step as

sudo /your/path/to/python setup.py install
or
sudo env PATH=$PATH python setup.py install

This applies only to the installation step and not to subsequent yambopy use.

== Tutorial files ==
The tutorial CORE databases can be obtained

* from the [[Yambo_Virtual_Machine|Yambo Virtual Machine]]
* from the Yambo web-page
* from the Yambo GIT tutorial repository

=== From the Yambo Virtual Machine (VM) ===
If you are using the VM, a recent version of the tutorial files is provided.Follow these [[Yambo_Virtual_Machine#Updating_the_Yambo_tutorial_files| instructions]] to update the tutorial files to the most recent version.

=== From the Yambo website ===
If you are using your own installation or the docker, the files needed to run the tutorials can be downloaded from the lists below.

After downloading the tar.gz files just unpack them in '''the YAMBO_TUTORIALS''' folder. For example
$ mkdir YAMBO_TUTORIALS
$ mv hBN.tar.gz YAMBO_TUTORIALS
$ cd YAMBO_TUTORIALS
$ tar -xvfz hBN.tar.gz
$ ls YAMBO_TUTORIALS
hBN

====Files needed for modular tutorials====
All of the following should be downloaded prior to following the modular tutorials:

{| class="wikitable"
|-
! Tutorial !! File(s)
|-
| rowspan="4"| hBN || [https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz hBN.tar.gz]
|-
| [https://media.yambo-code.eu/educational/tutorials/files/hBN-convergence-kpoints.tar.gz hBN-convergence-kpoints.tar.gz]
|-
| [https://media.yambo-code.eu/educational/tutorials/files/hBN-2D.tar.gz hBN-2D.tar.gz]
|-
| [https://media.yambo-code.eu/educational/tutorials/files/hBN-2D-para.tar.gz hBN-2D-para.tar.gz]
|-
|}

====Files needed for stand-alone tutorials====
At the start of each tutorial you will be told which specific file needs to be downloaded:

{| class="wikitable"
|-
! Tutorial !! File(s)
|-
| rowspan="2"| Silicon || [https://media.yambo-code.eu/educational/tutorials/files/Silicon.tar.gz Silicon.tar.gz]
|-
|[https://media.yambo-code.eu/educational/tutorials/files/Silicon_Electron-Phonon.tar.gz Silicon_Electron-Phonon.tar.gz]
|-
| LiF || [https://media.yambo-code.eu/educational/tutorials/files/LiF.tar.gz LiF.tar.gz]
|-
| Aluminum || [https://media.yambo-code.eu/educational/tutorials/files/Aluminum.tar.gz Aluminum.tar.gz]
|-
| GaSb || [https://media.yambo-code.eu/educational/tutorials/files/GaSb.tar.gz GaSb.tar.gz]
|-
| AlAs || [https://media.yambo-code.eu/educational/tutorials/files/AlAs.tar.gz AlAs.tar.gz]
|-
| Hydrogen_Chain || [https://media.yambo-code.eu/educational/tutorials/files/Hydrogen_Chain.tar.gz Hydrogen_Chain.tar.gz]
|-
| MoS2 for HPC || [https://media.yambo-code.eu/educational/tutorials/files/MoS2_HPC_tutorial.tar.gz MoS2_HPC_tutorial.tar.gz]
|-
| MoS2 for HPC shorter version || [https://media.yambo-code.eu/educational/tutorials/files/MoS2_2Dquasiparticle_tutorial.tar.gz MoS2_2Dquasiparticle_tutorial.tar.gz]
|-
| Yambopy for QE || [https://media.yambo-code.eu/educational/tutorials/files/databases_qepy.tar databases_qepy]
|-
| Yambopy for YAMBO || [https://media.yambo-code.eu/educational/tutorials/files/databases_yambopy.tar databases_yambopy]
|-
|}

=== From the Git Tutorial Repository (advanced users) ===
If you are using your own installation or the docker, the [https://github.com/yambo-code/tutorials tutorials repository] contains the updated tutorials CORE databases. To use it
$ git clone https://github.com/yambo-code/tutorials.git YAMBO_TUTORIALS
$ cd YAMBO_TUTORIALS
$ ./setup.pl -install

== Stand-alone tutorials ==
These tutorials are self-contained and cover a variety of mixed topics, both physical and methodological. They are designed to be followed from start to finish in one page and do not require previous knowledge of yambo. Each tutorial requires download of a specific core database, and typically they cover a specific physical system (like bulk GaSb or a hydrogen chain). Ground state input files and pseudopotentials are provided. Output files are also provided for reference.

These tutorials can be accessed directly from this page of from the side bar. They include different kind of subjects:

'''Warning''': These tutorials were prepared using previous version of the Yambo code: some command lines, variables, reports and outputs can be slightly different from the last version of the code. Scripts for parsing output cannot work anymore and should be edited to work with the new outputs. New command lines can be accessed typing <code>yambo -h </code>

=== Basic ===
* [[LiF|Linear Response in 3D. Excitons at work]]
* [[Silicon|GW convergence]]

=== Post Processing ===
* [[Yambo Post Processing (ypp)]]
* [https://www.yambo-code.org/wiki/index.php?title=First_steps_in_Yambopy First Steps in YamboPy]
* [https://www.yambo-code.org/wiki/index.php?title=Yambopy_tutorial:_band_structures Yambopy tutorial: band structures]
* [https://www.yambo-code.org/wiki/index.php?title=Yambopy_tutorial:_Yambo_databases Yambopy tutorial: Yambo databases]

=== Advanced ===
* [[Hydrogen chain|TDDFT Failure and long range correlations]]
* [[Linear response from real time simulations]]

==== GW and Quasi-particles ====
* [[Real_Axis_and_Lifetimes|Real Axis and Lifetimes]]
* [[Self-consistent GW on eigenvalues only]]
* [[GW tutorial on HPC]]

==== Electron phonon coupling ====
* [[Electron Phonon Coupling|Electron Phonon Coupling]]
* [[Optical properties at finite temperature]]
* [[Phonon-assisted luminescence by finite atomic displacements]]
* [[Exciton-phonon coupling and luminescence]]

==== Non linear response ====
* [http://www.attaccalite.com/lumen/linear_response.html Linear response using Dynamical Berry phase]
* [[Real time approach to non-linear response]]
* [[Correlation effects in the non-linear response]]
* [http://www.attaccalite.com/lumen/thg_in_silicon.html Third Harmonic Generation]
* [http://www.attaccalite.com/lumen/spin_orbit.html Spin-orbit coupling and non-linear response]
* [[Two-photon absorption]]
* [[Pump and Probe]]
* [[Parallelization for non-linear response calculations]]

==== Developing Yambo ====
* [[How to create a new project in Yambo]]
* [[How to create a new ypp interface]]
* [[Some hints on github]]




== Modular tutorials ==
These tutorials are designed to provide a deeper understanding of specific yambo tasks and runlevels. They are designed to avoid repetition of common procedures and physical concepts. As such, they make use of the same physical systems: bulk hexagonal boron nitride ''hBN'' and a hBN sheet ''hBN-2D''.

'''Warning''': These tutorials were prepared using previous version of the Yambo code: some command lines, variables, reports and outputs can be slightly different from the last version of the code. Scripts for parsing output cannot work anymore and should be edited to work with the new outputs. New command lines can be accessed typing <code>yambo -h </code>

====Introduction====
* [[First steps: a walk through from DFT to optical properties]]
====Quasiparticles in the GW approximation====
* [[How to obtain the quasi-particle band structure of a bulk material: h-BN]]
====Using Yambo in Parallel====
This modules contains very general discussions of the parallel environment of Yambo. Still the actual run of the code is specific to the CECAM cluster. If you want to run these modules just replace the parallel queue instructions with simple MPI commands.

* [[GW_parallel_strategies|Parallel GW (CECAM specific)]]: strategies for running Yambo in parallel
[[GW_parallel_strategies_CECAM]]
* [[Pushing_convergence_in_parallel|GW convergence (CECAM specific)]]: use Yambo in parallel to converge a GW calculation for a layer of hBN (hBN-2D)

====Excitons and the Bethe-Salpeter Equation====
* [[How to obtain an optical spectrum|Calculating optical spectra including excitonic effects: a step-by-step guide]]
* [[How to choose the input parameters|Obtaining a converged optical spectrum]]
* [[How to treat low dimensional systems|Many-body effects in low-dimensional systems: numerical issues and remedies]]
* [[How to analyse excitons|Analysis of excitonic spectra in a 2D material]]


====Yambopy====
* [[First steps in Yambopy]]
* [[GW tutorial. Convergence and approximations (BN)]]
* [[Bethe-Salpeter equation tutorial. Optical absorption (BN)]]
* [[Yambopy tutorial: band structures | Database and plotting tutorial for quantum espresso: qepy]]
* [[Yambopy tutorial: Yambo databases | Database and plotting tutorial for yambo: yambopy ]]


=== Modules ===
Alternatively, users can learn more about a specific runlevel or task by looking at the individual '''[[Modules|documentation modules]]'''. These provide a focus on the input parameters, run time behaviour, and underlying physics. Although they can be followed separately, non-experts are urged to follow them as part of the more structured tutorials given above.

Tutorials

2023-05-23T10:22:24Z

Giacomo.sesti: /* Files needed for stand-alone tutorials */

If you are starting out with Yambo, or even an experienced user, we recommend that you complete the following tutorials before trying to use Yambo for your system.

The tutorials are meant to give some introductory background to the key concepts behind Yambo. Practical topics such as convergence are also discussed.
Nonetheless, users are invited to first read and study the [[lectures|background material]] in order to get familiar with the fundamental physical quantities.

Two kinds of tutorials are provided: '''stand-alone''' and '''modular'''. In addition you must have a working environment where both Yambo (and eventually QE) are installed.

== Setting up Yambo (and eventually QE) ==
To be able to follow the school you need a running version of the yambo/QE code.

There are several different ways to prepare a working environment.

=== Virtual Machine(s) ===
The easiest way is to access to a virtual machine which contains both (i) yambo/QE and (ii) the tutorials.

You can do it in one of two ways:
* Virtual machine via [[ICTP cloud|ICTP cloud]] If the schools you are attending provided an ICTP virtual machine this is the preferred option. It works through internet connection inside a browser.
* Install the [[Yambo_Virtual_Machine|yambo virtual machine]] on your laptop / desktop. This requires Oracle virtual box. Pre-download of the Virtual machine. No internet connection needed.


=== User installation ===
You can also setup the yambo code on your on laptop / desktop using different methods.

As far as the Yambo source is concerned you can:
* Install [[Yambo via Docker|Yambo via Docker]]
* [[download|Download]] and [[Installation|install]] yambo on your laptop / desktop (requires a linux machine).
* Install yambo on your laptop/desktop/cluster [https://github.com/nicspalla/my-repo via Spack].
* Install using Anaconda.

=== Yambo User Installation with Anaconda ===
It is possible to install Yambo (up to v5.0.4) and Quantum-ESPRESSO via conda-forge (a conda channel/repository):
To setup Anaconda, please start from installing [https://www.anaconda.com/products/distribution#Downloads Anaconda] or [https://docs.conda.io/en/latest/miniconda.html Miniconda].

Then we suggest to create a conda environment and activate it:
conda create --name yambopy -c conda-forge
conda activate yambopy
Then you can install the prerequisites and the two codes:
conda install numpy scipy netcdf4 matplotlib pyyaml lxml pandas
conda install yambo
conda install qe

==Setting up Yambopy==

===Quick installation===

A quick way to start using Yambopy is described here.

* Make sure that you are using Python 3 and that you have the following python packages: <code>numpy</code>, <code>scipy</code>, <code>matplotlib</code>, <code>netCDF4</code>, <code>lxml</code>, <code>pyyaml</code>. Optionally, you may want to have abipy [[https://abinit.github.io/abipy/index.html]] installed for band structure interpolations.

* Go to a directory of your choice and clone yambopy from the git repository

git clone https://github.com/yambo-code/yambopy.git

If you don't want to use git, you may download a tarball from the git repository instead.

* Enter into the yambopy folder and install

cd yambopy
sudo python setup.py install

If you don't have administrative privileges (for example on a computing cluster), type instead

cd yambopy
python setup.py install --user

===Installing dependencies with Anaconda===
We suggest installing yambopy using Anaconda [[https://www.anaconda.com/products/distribution]] to manage the various python packages.

In this case, you can follow these steps.

First, install the required dependencies:
conda install numpy scipy netcdf4 lxml pyyaml

Then we create a conda environment based on python 3.6 (this is to ensure compatibility with abipy if we want to install it later on):
conda create --name NAME_ENV python=3.6
Here choose <code>NAME_ENV</code> as you want, e.g. <code>yenv</code>.

Now, we install abipy and its dependency pymatgen using <code>pip</code>. Here make sure that you are using the <code>pip</code> version provided by Anaconda and not your system version.

pip install pymatgen
pip install abipy

Finally, we are ready to install yambopy:

git clone https://github.com/yambo-code/yambopy.git

(or download and extract tarball) and follow the steps outlined in the quick installation section.

Now enter into the yambopy folder and install

cd yambopy
sudo python setup.py install

If you don't have administrative privileges (for example on a computing cluster), type instead

cd yambopy
python setup.py install --user

===Frequent issues===
When running the installation you may get a <code>SyntaxError</code> related to utf-8 encoding or it may complain that module <code>setuptools</code> is not installed even though it is. In this case, it means that the <code>sudo</code> command is not preserving the correct <code>PATH</code> for your python executable.

Solve the problem by running the installation step as

sudo /your/path/to/python setup.py install
or
sudo env PATH=$PATH python setup.py install

This applies only to the installation step and not to subsequent yambopy use.

== Tutorial files ==
The tutorial CORE databases can be obtained

* from the [[Yambo_Virtual_Machine|Yambo Virtual Machine]]
* from the Yambo web-page
* from the Yambo GIT tutorial repository

=== From the Yambo Virtual Machine (VM) ===
If you are using the VM, a recent version of the tutorial files is provided.Follow these [[Yambo_Virtual_Machine#Updating_the_Yambo_tutorial_files| instructions]] to update the tutorial files to the most recent version.

=== From the Yambo website ===
If you are using your own installation or the docker, the files needed to run the tutorials can be downloaded from the lists below.

After downloading the tar.gz files just unpack them in '''the YAMBO_TUTORIALS''' folder. For example
$ mkdir YAMBO_TUTORIALS
$ mv hBN.tar.gz YAMBO_TUTORIALS
$ cd YAMBO_TUTORIALS
$ tar -xvfz hBN.tar.gz
$ ls YAMBO_TUTORIALS
hBN

====Files needed for modular tutorials====
All of the following should be downloaded prior to following the modular tutorials:

{| class="wikitable"
|-
! Tutorial !! File(s)
|-
| rowspan="4"| hBN || [https://media.yambo-code.eu/educational/tutorials/files/hBN.tar.gz hBN.tar.gz]
|-
| [https://media.yambo-code.eu/educational/tutorials/files/hBN-convergence-kpoints.tar.gz hBN-convergence-kpoints.tar.gz]
|-
| [https://media.yambo-code.eu/educational/tutorials/files/hBN-2D.tar.gz hBN-2D.tar.gz]
|-
| [https://media.yambo-code.eu/educational/tutorials/files/hBN-2D-para.tar.gz hBN-2D-para.tar.gz]
|-
|}

====Files needed for stand-alone tutorials====
At the start of each tutorial you will be told which specific file needs to be downloaded:

{| class="wikitable"
|-
! Tutorial !! File(s)
|-
| rowspan="2"| Silicon || [https://media.yambo-code.eu/educational/tutorials/files/Silicon.tar.gz Silicon.tar.gz]
|-
|[https://media.yambo-code.eu/educational/tutorials/files/Silicon_Electron-Phonon.tar.gz Silicon_Electron-Phonon.tar.gz]
|-
| LiF || [https://media.yambo-code.eu/educational/tutorials/files/LiF.tar.gz LiF.tar.gz]
|-
| Aluminum || [https://media.yambo-code.eu/educational/tutorials/files/Aluminum.tar.gz Aluminum.tar.gz]
|-
| GaSb || [https://media.yambo-code.eu/educational/tutorials/files/GaSb.tar.gz GaSb.tar.gz]
|-
| AlAs || [https://media.yambo-code.eu/educational/tutorials/files/AlAs.tar.gz AlAs.tar.gz]
|-
| Hydrogen_Chain || [https://media.yambo-code.eu/educational/tutorials/files/Hydrogen_Chain.tar.gz Hydrogen_Chain.tar.gz]
|-
| MoS2 for HPC || [https://media.yambo-code.eu/educational/tutorials/files/MoS2_HPC_tutorial.tar.gz MoS2_HPC_tutorial.tar.gz]
|-
| MoS2 for HPC shorter version || [https://media.yambo-code.eu/educational/tutorials/files/MoS2_2Dquasiparticle_tutorial.tar.gz]
|-
| Yambopy for QE || [https://media.yambo-code.eu/educational/tutorials/files/databases_qepy.tar databases_qepy]
|-
| Yambopy for YAMBO || [https://media.yambo-code.eu/educational/tutorials/files/databases_yambopy.tar databases_yambopy]
|-
|}

=== From the Git Tutorial Repository (advanced users) ===
If you are using your own installation or the docker, the [https://github.com/yambo-code/tutorials tutorials repository] contains the updated tutorials CORE databases. To use it
$ git clone https://github.com/yambo-code/tutorials.git YAMBO_TUTORIALS
$ cd YAMBO_TUTORIALS
$ ./setup.pl -install

== Stand-alone tutorials ==
These tutorials are self-contained and cover a variety of mixed topics, both physical and methodological. They are designed to be followed from start to finish in one page and do not require previous knowledge of yambo. Each tutorial requires download of a specific core database, and typically they cover a specific physical system (like bulk GaSb or a hydrogen chain). Ground state input files and pseudopotentials are provided. Output files are also provided for reference.

These tutorials can be accessed directly from this page of from the side bar. They include different kind of subjects:

'''Warning''': These tutorials were prepared using previous version of the Yambo code: some command lines, variables, reports and outputs can be slightly different from the last version of the code. Scripts for parsing output cannot work anymore and should be edited to work with the new outputs. New command lines can be accessed typing <code>yambo -h </code>

=== Basic ===
* [[LiF|Linear Response in 3D. Excitons at work]]
* [[Silicon|GW convergence]]

=== Post Processing ===
* [[Yambo Post Processing (ypp)]]
* [https://www.yambo-code.org/wiki/index.php?title=First_steps_in_Yambopy First Steps in YamboPy]
* [https://www.yambo-code.org/wiki/index.php?title=Yambopy_tutorial:_band_structures Yambopy tutorial: band structures]
* [https://www.yambo-code.org/wiki/index.php?title=Yambopy_tutorial:_Yambo_databases Yambopy tutorial: Yambo databases]

=== Advanced ===
* [[Hydrogen chain|TDDFT Failure and long range correlations]]
* [[Linear response from real time simulations]]

==== GW and Quasi-particles ====
* [[Real_Axis_and_Lifetimes|Real Axis and Lifetimes]]
* [[Self-consistent GW on eigenvalues only]]
* [[GW tutorial on HPC]]

==== Electron phonon coupling ====
* [[Electron Phonon Coupling|Electron Phonon Coupling]]
* [[Optical properties at finite temperature]]
* [[Phonon-assisted luminescence by finite atomic displacements]]
* [[Exciton-phonon coupling and luminescence]]

==== Non linear response ====
* [http://www.attaccalite.com/lumen/linear_response.html Linear response using Dynamical Berry phase]
* [[Real time approach to non-linear response]]
* [[Correlation effects in the non-linear response]]
* [http://www.attaccalite.com/lumen/thg_in_silicon.html Third Harmonic Generation]
* [http://www.attaccalite.com/lumen/spin_orbit.html Spin-orbit coupling and non-linear response]
* [[Two-photon absorption]]
* [[Pump and Probe]]
* [[Parallelization for non-linear response calculations]]

==== Developing Yambo ====
* [[How to create a new project in Yambo]]
* [[How to create a new ypp interface]]
* [[Some hints on github]]




== Modular tutorials ==
These tutorials are designed to provide a deeper understanding of specific yambo tasks and runlevels. They are designed to avoid repetition of common procedures and physical concepts. As such, they make use of the same physical systems: bulk hexagonal boron nitride ''hBN'' and a hBN sheet ''hBN-2D''.

'''Warning''': These tutorials were prepared using previous version of the Yambo code: some command lines, variables, reports and outputs can be slightly different from the last version of the code. Scripts for parsing output cannot work anymore and should be edited to work with the new outputs. New command lines can be accessed typing <code>yambo -h </code>

====Introduction====
* [[First steps: a walk through from DFT to optical properties]]
====Quasiparticles in the GW approximation====
* [[How to obtain the quasi-particle band structure of a bulk material: h-BN]]
====Using Yambo in Parallel====
This modules contains very general discussions of the parallel environment of Yambo. Still the actual run of the code is specific to the CECAM cluster. If you want to run these modules just replace the parallel queue instructions with simple MPI commands.

* [[GW_parallel_strategies|Parallel GW (CECAM specific)]]: strategies for running Yambo in parallel
[[GW_parallel_strategies_CECAM]]
* [[Pushing_convergence_in_parallel|GW convergence (CECAM specific)]]: use Yambo in parallel to converge a GW calculation for a layer of hBN (hBN-2D)

====Excitons and the Bethe-Salpeter Equation====
* [[How to obtain an optical spectrum|Calculating optical spectra including excitonic effects: a step-by-step guide]]
* [[How to choose the input parameters|Obtaining a converged optical spectrum]]
* [[How to treat low dimensional systems|Many-body effects in low-dimensional systems: numerical issues and remedies]]
* [[How to analyse excitons|Analysis of excitonic spectra in a 2D material]]


====Yambopy====
* [[First steps in Yambopy]]
* [[GW tutorial. Convergence and approximations (BN)]]
* [[Bethe-Salpeter equation tutorial. Optical absorption (BN)]]
* [[Yambopy tutorial: band structures | Database and plotting tutorial for quantum espresso: qepy]]
* [[Yambopy tutorial: Yambo databases | Database and plotting tutorial for yambo: yambopy ]]


=== Modules ===
Alternatively, users can learn more about a specific runlevel or task by looking at the individual '''[[Modules|documentation modules]]'''. These provide a focus on the input parameters, run time behaviour, and underlying physics. Although they can be followed separately, non-experts are urged to follow them as part of the more structured tutorials given above.

Quasi-particles of a 2D system

2023-05-23T10:20:46Z

Giacomo.sesti:

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 20 | # [GW] G[W] bands range
%
GTermKind= "none" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 2.598 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
RIM_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done on when using Yambo on clusters. So now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_HPC_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

Tthe [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, edit the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, as we have seen, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have <code>ntasks-per-node=2</code>. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, even though we are we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 8 MPI tasks, and using 4 threads.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely MPI calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.

Quasi-particles of a 2D system

2023-05-23T10:20:18Z

Giacomo.sesti: /* Step 1: Speeding up the self-energy convergence */

[[File:MoS2.png|thumb| MoS2 monolayer (top and side views). Gray: Mo atoms, yellow: S atoms.]]

In this tutorial you will compute the quasiparticle corrections to the band structure of a free-standing single layer of MoS2. Aim of the tutorial is to learn how to efficiently run a GW simulation in a 2D material based on:
*Acceleration techniques of GW, some of which specific for 2D systems
*Parallelization techniques

In the end, you will obtain a quasiparticle band structure based on the simulations, the first step towards the reproduction of an ARPES spectrum.
Beware: we won’t use fully converged parameters, so the final result should not be considered very accurate.

==Step 1: Speeding up the self-energy convergence in the k-grid ==

In this section, you will learn to use two algorithms present in Yambo that lead to a speed up of the self-energy convergence with respect to the k-grid.

To appreciate the impact of these algorithms, let us first perform a GW computation for the monolayer of MoS2.

Enter the <code> 01_GW_first_run</code> folder and generate the input file:

yambo -F gw_ppa.in -p p

Modify the input file as follows:

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 80 | # [GW] G[W] bands range
%
GTermKind= "BG" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

Here, consider the number of G vectors and the number of bands as already being converged. So, compute the gap at GW level:

yambo -F gw_ppa.in -J 80b_10Ry

Once terminated the computation, you can now inspect the output file <code> o-80b_10Ry.qp</code>.

You should have obtained a GW gap of 2.483 eV.

==RIM==
[[File:Circle box.gif|thumb]]

To understand how we can improve the calculation in the k-grid to achieve a speed-up, consider briefly again the expression of the exchange part of the self-energy

[[File:Sx.png|none|x50px|caption]]

You can notice that the integration around q=0 here is problematic for the presence of a 1/q from the Coulomb potential that diverges, while all the other terms are slowly varying in q. Usually in Yambo this integration is performed analytically in a small sphere around q=0.

However, in this way the code lost part of the integral out of the circle. This usually
is not problematic because for a large number of q and k point the missing term goes to zero.
However in system that requires few k-points or even only the gamma one, it is possible
to perform a better integration of this term by adding the flag <code>-r</code>. So as

yambo -F gw_ppa.in -p p -r

In this manner, you generate the input:

HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] G0W0 Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
rim_cut # [R] Coulomb potential
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
...

In this input, <code>[[Variables#RandGvec|RandGvec]]</code> is the number of component of the Coulomb potential we want integrate numerically and <code>[[Variables#RandQpts|RandQpts]]</code> is the number of random points used to perform the integral by Monte Carlo.
The [CUT] keyword refers to the truncation of the Coulomb interaction to avoid spurious interaction between periodically repeated copies of the simulation supercell along the z-direction (that in this case is the non periodic direction). Keep in mind that the vacuum space between two copies of the system should be converged: here we are using 20 bohr but a value of 40 bohr would be more realistic.

If you turn on this integration you will get a slightly different band gap, but in the limit of large k points the final results will be the same of the standard method.
However this correction is important for systems that converge with few k-points or with gamma only and it is applied both at the exchange and correlation part of the self-energy.

Now make a GW computation using the RIM method

gw0 # [R] GW approximation
ppa # [R][Xp] Plasmon Pole Approximation for the Screened Interaction
dyson # [R] Dyson Equation solver
HF_and_locXC # [R] Hartree-Fock
em1d # [R][X] Dynamically Screened Interaction
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
0.000000 | 0.000000 | 0.000000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 37965 RL # [XX] Exchange RL components
VXCRLvcs= 37965 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% BndsRnXp
1 | 80 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
XTermKind= "none" # [X] X terminator ("none","BG" Bruneval-Gonze)
% GbndRnge
1 | 20 | # [GW] G[W] bands range
%
GTermKind= "none" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
7|7|13|14|
%

yambo -F gw_ppa.in -J 80b_10Ry_rim

You can now inspect the output file <code>o-80b_10Ry_rim.qp</code>. The GW gap should have increased to 2.598 eV.

==RIM-W==

In 2D systems, the improvement granted by the better integration around q=0 of the Coulomb potential via the RIM approach does not work well for the correlation part of self-energy

[[File:Sigma_c.png|none|x50px|caption]]

[[File:Rimw.png|thumb| Convergence of the quasiparticle band gap of MoS2
with respect to the number of sampling points of the
BZ. Blue lines: standard integration methods. The extrapolated values are indicated with
an horizontal dashed line. Orange lines: RIM-W method. The grey shaded regions show
the converge tolerance (±50 meV) centered at the centered at the extrapolated values.]]

as the dielectric function <math>\epsilon_{00}</math> goes as q around q=0, matching the 1/q behavior of the Coulomb potential.

To solve this issue, it has been developed a method specific for 2D systems that performs the integration of <math>\epsilon_{0 0}(\mathbf{q}) \ v(\mathbf{q})</math> rather than only of <math>v(\mathbf{q})</math> when computing the correlation part of self-energy.

Details of this method can be found in the paper [https://www.nature.com/articles/s41524-023-00989-7| Efficient GW calculations in two-dimensional materials through a stochastic integration of the screened potential]. You can activate the RIM-W algorithm adding a further flag to your input:

yambo -F gw_ppa.in -p p -r -rw

In your input three further lines should have appeared:

...
rim_cut # [R] Coulomb potential
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec= 100 RL # [RIM] Coulomb interaction RS components
RIM_W
RandGvecW= 15 RL
rimw_type="semiconductor"
...

The variable <code>RandGvecW</code> defines the number of G vectors to integrate W numerically. <code>RandGvecW</code> '''must''' be always smaller than the size of the <math>\chi</math> matrix.

Now repeat the computation again using the RIM-W algorithm.

yambo -F gw_ppa.in -J 80b_10Ry_rimw

How much has the band gap changed ?

==Step 2: GW parallel strategies==

For this part of the tutorial, we will see how to use a parallel strategy on Yambo. As a test calculation, we compute the full band structure on a larger number of bands with respect the previous calculation.

Differently to the approach used up to now, we will not work interactively but rather we will submit a job script as is commonly done on when using Yambo on clusters. So now exit the interactive mode and in the login node access the folder:

cd $CINECA_SCRATCH/YAMBO_TUTORIALS/MoS2_HPC_tutorial/02_GW_parallel

Here, inspect the input <code>gw.in</code>

...
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|7|13|17|
%
...

For this part of the tutorial, we will be using the slurm submission script job_parallel.sh, which is available in the calculation directory.
If you inspect it, you will see that the script adds additional variables to the yambo input file.
These variables control the parallel execution of the code:

DIP_CPU= "1 $ncpu 1" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 $ncpu 1" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 $ncpu 1" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

The keyword <code>DIP</code> refers to the calculations of the screening matrix elements (also called “dipoles”) needed for the screening function, <code>X</code> is the screening function (it stands for χ since it is a response function), <code>SE</code> the self-energy.
These three sections of the code can be parallelised independently.

Tthe [PARALLEL] variables refer to MPI tasks, instead the threads are used for [OPENMP] parallelisation.

We start by calculating the QP corrections using only the MPI tasks and a single openMP thread. Therefore, edit the submission script as:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
and submit the job
sbatch job_parallel.sh

This will create a new input file and run it. The calculation databases and the human-readable files will be put in separate directories. Check the location of the report <code>r-*</code> file and the log <code>l-*</code> files, and inspect them while the calculation runs (it should take a couple of minutes).

For simplicity you could just type

tail -f run_MPI8_OMP1.out/LOG/l-*_CPU_1

to monitor the progress in the master thread (<code>Ctrl+C</code> to exit). As you can see, the run takes some time, even though we are using minimal parameters.

On a cluster like m100, to activate the full potential of the machine, it is useful to activate OpenMP threads by modifying cpus-per-task in the submission file. The product of the number of OpenMP and MPI tasks is equal to the total number of CPUs.

Therefore, we can distribute 8 cpus with 4 MPI tasks and then use 2 OpenMP threads:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node4
#SBATCH --cpus-per-task=2

Actually, we don’t need to change the related openMP variables for the yambo input, since the value 0 means “use the value of OMP_NUM_THREADS” and we have now set this environment variable to 2 via our script.
Otherwise, any positive number can directly specify the number of threads to be used in each section of the code.

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
...
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
...
SE_Threads= 0 # [OPENMP/GW] Number of threads for self-energy

You can try to run this calculation and check if it is faster than the pure MPI one from before. In general, you expect to a massive gain with OpenMP if you are already close to an efficient MPI scaling

You can also try any other thread combinations including the pure OpenMP scaling, and compare the timings.

In real-life calculations running on a large number of cores, as we have seen, it may be a good idea to adopt a hybrid approach. The most efficient scaling can depend both on your system and on the HPC facility you’re running on.

In general, OpenMP can help lower memory requirements within a node. You can try to increase the OpenMP share of threads if getting Out Of Memory errors.

==Step 3: Running on GPU==

For this part of the tutorial, we will repeat the calculation of before, making use of gpus.

So now move to:

02_GW_gpu

Here we are using yet another capability of yambo: running on GPUs instead of CPUs. This usually leads to extreme speedups in the calculations. Fortunately, the m100 cluster also has some GPU nodes! So now let us have a look at the submission script:

vim gpu_job.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4

each GPU node contains four GPUs (<code>--gres=gpu:4</code>). In yambo, each GPU corresponds to a single MPI task, therefore we have <code>ntasks-per-node=2</code>. OpenMP threading is allowed - but not too much, otherwise we lose efficiency. Here in fact, even though we are we are not using OpenMP so </code>export OMP_NUM_THREADS=1</code>.

In addition, you will see that in order to run on GPUs we are now using a different executable than before, obtained with a GPU-specific compilation of the code.

module purge
module load hpc-sdk/2022--binary spectrum_mpi/10.4.0--binary
export PATH=/m100_work/tra23_Yambo/softwares/YAMBO/5.2-gpu/bin:$PATH

In general, the gpu compilation might be different on your machine, or the executable may be the same with no need to load an additional modules. In conclusion, we are running over 4 GPU cards, distributed with 8 MPI tasks, and using 4 threads.

The calculation should faster, about 2 minutes instead of 2 min and 30s with a purely MPI calculation. The gain can become even greater in larger systems. You can have a look at the results collected in folder <code>MPI2_OMP1</code>. The quasiparticle corrections are stored in human-readable form in the file <code>MPI2_OMP1.out/o-GW_bnds.qp</code>, and in netCDF format in the quasiparticle database <code>MPI2_OMP1/ndb.QP</code>.