Manual:Advanced ways of running Octopus
Octopus can be run in multi-tentacle (parallel) mode if you have configured it that way when you compiled and installed it. There are four possible parallelization strategies: domains, states, spin/k-points, and electron-hole pairs (only for Casida). You do not need to specify in the input file what kind of parallelization strategy you want to use as Octopus can find reasonable defaults. However, the experienced user may want to tune the corresponding variables.
Parallel in States
The first strategy is parallel in states. This means that each processor takes some states and propagates them independently of the others.
To use this include
in your input file.
This is clearly the most efficient (and simplest) way of parallelizing the code if the number of states is much larger than the number of processors available (which is usually the case for us). You can expect a more or less linear speed-up with number of processors until you reach around 5 states/processor. Beyond this it starts to saturate.
Parallel in Domains
When running parallel in "domains", Octopus divides the simulation region (the box) into separate regions (domains) and assigns each of of these to a different processor. This allows it not only to speed up the calculation, but also to divide the memory among the different processors. The first step of this process, the splitting of the box, is in general a very complicated process. Note that we are talking about a simulation box of an almost arbitrary shape, and of an arbitrary number of processors. Furthermore, as the communication between processors grows proportionally to the surface of the domains, one should use an algorithm that divides the box such that each domain has the same number of points, and at the same time minimizes the total area of the domains. Octopus uses the METIS library to make this domain decomposition of the space, METIS is included in the source code of Octopus and will be used by default.
So for example, if you are planning to run the ground state of Na there's only one choice:
Octopus will then partition the real-space mesh and compute the resulting pieces on different nodes. Later when you plan to do time propagations you have more choices at hand. You could run only parallel in states
= auto = no
or in domains (as above), or you can decide to employ both parallelization strategies
= auto = auto
Problems with parallelization
The parallelization is only effective if a sufficiently large amount of load is given to each node. Otherwise communication dominates. In that case one should do the calculation in serial. This happens, for example, if you attempt to do parallelization in real-space domains, but your grid does not have too many points. If the code detects that the number of points per node is not large enough, it gives the warning:
** Warning: ** From node = 0 ** I have less elements in a parallel group than recommended. ** Maybe you should reduce the number of nodes
Compiling and running in parallel
The steps to compile and use Octopus in parallel are:
1) use the --enable-mpi in the configuration script
2) give to Octopus the libraries, etc. necessary to compile a parallel code. When you install mpich in your system, it creates a series of programs called mpicc, mpif77, mpif90, etc. These are wrappers that call the specific compiler with the appropriate flags, libraries, etc. This means that to compile Octopus in parallel you'll need mpif90. Please check that this is present in your system (perhaps in /usr/local/bin if you compiled mpich from the sources). Octopus will try to find this compiler, but it may not be able to find it for several reasons:
- mpif90 is _not_ in the path.
- there is another f90 compiler specified by the user/system. This means that if you have the variable FC defined when you run the configure script, Octopus will use the fortran compiler specified by that variable, and will not try to find mpif90. Then, the configuration fails.
export FC=<whatever path>/mpif90 export FCFLAGS=<whatever flags> ./configure --enable-mpi
It is wise to also set the FCFLAGS, as Octopus cannot tell which fortran compiler you are using if you compile with mpif90.
3) Now you just have to run Octopus in parallel (this step depends on your actual system, you may have to use mpirun or mpiexec to accomplish it).
In some run modes (e.g., td), you can use the multi-level parallelization, i.e., to run in parallel in more than one way at the same time. In the td case, you can run parallel in states and in domains at the same time. In order to fine-tune this behavior, please take a look at the variables, , , and . In order to check if everything is OK, take a look at the output of Octopus in the "Parallelization" section. This is an example:
************************** Parallelization *************************** Octopus will run in *parallel* Info: Number of nodes in par_states group: 8 ( 62) Info: Octopus will waste at least 9.68% of computer time **********************************************************************
In this case, Octopus runs in parallel only in states, using 8 processors (for 62 states). Furthermore, some of the processors will be idle for 9.68% of the time (this is not that great, so maybe a different number of processors would be better in this case).
Passing arguments from environment variables
Octopus can also read input variables from the environment variables of the shell. This is especially useful if you want to call Octopus from scripts or if you want to pass one-time arguments when running Octopus.
To activate this feature you must first set
OCT_PARSE_ENV to some value, for example in sh/bash
$ export OCT_PARSE_ENV=1
After that, to pass an input variable to Octopus, you must prepend the name of the variable by
'OCT_', for example:
$ export OCT_Radius=10
After that you can run Octopus as usual.
You can also pass the variables in the same command line:
$ OCT_PARSE_ENV=1 OCT_Radius=10 octopus
If you set a variable both from the input file and as a environment variable, the environment variable takes precedence. Be careful with this behaviour. Blocks are not supported (suggestions as how to elegantly put a block inside of environment variables are welcome).
There is an additional environment variable OCTOPUS_SHARE that can be set to specify the location of the variables file used by the parser. If this environment variable is not set, the default is as set by the configure script, namely prefix/share/octopus. You can set this to something else if necessary, for example to handle moving an executable from one machine to another, where the original path does not exist on the new machine.
For example this is a simple script that helps to determine the optimal Radius of the simulation box (an inp file with the rest of the parameters has to be provided):
export OCT_PARSE_ENV=1 for OCT_Radius in `seq 5 16` do export OCT_Radius octopus >& out-$OCT_Radius energy=`grep Total static/info | head -1 | cut -d "=" -f 2` echo $OCT_Radius $energy done