Optimization
Name HamiltonianApplyPacked
Section Execution::Optimization
Type logical
Default yes
If set to yes (the default), Octopus will ‘pack’ the
wave-functions when operating with them. This might involve some
additional copying but makes operations more efficient.
See also the related StatesPack variable.
Name MemoryLimit
Section Execution::Optimization
Type integer
Default -1
If positive, Octopus will stop if more memory than MemoryLimit
is requested (in kb). Note that this variable only works when
ProfilingMode = prof_memory(_full).
Name MeshBlockDirection
Section Execution::Optimization
Type integer
Determines the direction in which the dimensions are chosen to compute
the blocked index for sorting the mesh points (see MeshBlockSize).
The default is increase_with_dimensions, corresponding to xyz ordering
in 3D.
Options:
- increase_with_dimension:
The fastest changing index is in the first dimension, i.e., in 3D this
corresponds to ordering in xyz directions.
- decrease_with_dimension:
The fastest changing index is in the last dimension, i.e., in 3D this
corresponds to ordering in zyx directions.
Name MeshBlockSize
Section Execution::Optimization
Type block
To improve memory-access locality when calculating derivatives,
Octopus arranges mesh points in blocks. This variable
controls the size of this blocks in the different
directions. The default is selected according to the value of
the StatesBlockSize variable. (This variable only affects the
performance of Octopus and not the results.)
Name MeshLocalBlockDirection
Section Execution::Optimization
Type integer
Determines the direction in which the dimensions are chosen to compute
the blocked index for sorting the mesh points (see MeshLocalBlockSize).
The default is increase_with_dimensions, corresponding to xyz ordering
in 3D.
Options:
- increase_with_dimension:
The fastest changing index is in the first dimension, i.e., in 3D this
corresponds to ordering in xyz directions.
- decrease_with_dimension:
The fastest changing index is in the last dimension, i.e., in 3D this
corresponds to ordering in zyx directions.
Name MeshLocalBlockSize
Section Execution::Optimization
Type block
To improve memory-access locality when calculating derivatives,
Octopus arranges mesh points in blocks. This variable
controls the size of this blocks in the different
directions. The default is selected according to the value of
the StatesBlockSize variable. (This variable only affects the
performance of Octopus and not the results.)
Name MeshLocalOrder
Section Execution::Optimization
Type integer
Default blocks
This variable controls how the grid points are mapped to a
linear array. This influences the performance of the code.
Options:
- order_blocks:
The grid is mapped using small parallelepipedic grids. The size
of the blocks is controlled by MeshBlockSize.
- order_cube:
The grid is mapped using a full cube, i.e. without blocking.
- order_global:
Use the ordering from the global mesh
Name MeshOrder
Section Execution::Optimization
Type integer
This variable controls how the grid points are mapped to a
linear array for global arrays. For runs that are parallel
in domains, the local mesh order may be different (see
MeshLocalOrder).
The default is blocks when serial in domains and cube when
parallel in domains with the local mesh order set to blocks.
Options:
- order_blocks:
The grid is mapped using small parallelepipedic grids. The size
of the blocks is controlled by MeshBlockSize.
- order_original:
The original order of the indices is used to map the grid.
- order_cube:
The grid is mapped using a full cube, i.e. without blocking.
Name NLOperatorCompactBoundaries
Section Execution::Optimization
Type logical
Default no
(Experimental) When set to yes, for finite systems Octopus will
map boundary points for finite-differences operators to a few
memory locations. This increases performance, however it is
experimental and has not been thoroughly tested.
Name OperateAccel
Section Execution::Optimization
Type integer
Default map
This variable selects the subroutine used to apply non-local
operators over the grid when an accelerator device is used.
Options:
- invmap:
The standard implementation ported to OpenCL.
- map:
A different version, more suitable for GPUs.
Name OperateComplex
Section Execution::Optimization
Type integer
Default optimized
This variable selects the subroutine used to apply non-local
operators over the grid for complex functions.
Options:
- fortran:
The standard Fortran function.
- optimized:
This version is optimized using vector primitives (if available).
Name OperateDouble
Section Execution::Optimization
Type integer
Default optimized
This variable selects the subroutine used to apply non-local
operators over the grid for real functions.
Options:
- fortran:
The standard Fortran function.
- optimized:
This version is optimized using vector primitives (if available).
Name ProfilingAllNodes
Section Execution::Optimization
Type logical
Default no
This variable controls whether all nodes print the time
profiling output. If set to no, the default, only the root node
will write the profile. If set to yes, all nodes will print it.
Name ProfilingMode
Section Execution::Optimization
Type integer
Default no
Use this variable to run Octopus in profiling mode. In this mode
Octopus records the time spent in certain areas of the code and
the number of times this code is executed. These numbers
are written in ./profiling.NNN/profiling.nnn with nnn being the
node number (000 in serial) and NNN the number of processors.
This is mainly for development purposes. Note, however, that
Octopus should be compiled with –disable-debug to do proper
profiling. Warning: you may encounter strange results with OpenMP.
Options:
- no:
No profiling information is generated.
- prof_time:
Profile the time spent in defined profiling regions.
- prof_memory:
As well as the time, summary information on memory usage and the largest arrays are reported.
- prof_memory_full:
As well as the time and summary memory information, a
log is reported of every allocation and deallocation.
- likwid:
Enable instrumentation using LIKWID.
- prof_io:
Count the number of file open and close.
Name ProfilingOutputTree
Section Execution::Optimization
Type logical
Default yes
This variable controls whether the profiling output is additionally
written as a tree.
Name ProfilingOutputYAML
Section Execution::Optimization
Type logical
Default no
This variable controls whether the profiling output is additionally
written to a YAML file.
Name StatesBlockSize
Section Execution::Optimization
Type integer
Some routines work over blocks of eigenfunctions, which
generally improves performance at the expense of increased
memory consumption. This variable selects the size of the
blocks to be used. If GPUs are used, the default is the
warp size (32 for NVIDIA, 32 or 64 for AMD);
otherwise it is 4.
Name StatesCLDeviceMemory
Section Execution::Optimization
Type float
Default -512
This variable selects the amount of OpenCL device memory that
will be used by Octopus to store the states.
A positive number smaller than 1 indicates a fraction of the total
device memory. A number larger than one indicates an absolute
amount of memory in megabytes. A negative number indicates an
amount of memory in megabytes that would be subtracted from
the total device memory.
Name StatesPack
Section Execution::Optimization
Type logical
When set to yes, states are stored in packed mode, which improves
performance considerably. Not all parts of the code will profit from
this, but should nevertheless work regardless of how the states are
stored.
If GPUs are used and this variable is set to yes, Octopus will store the wave-functions in device (GPU) memory. If there is not enough memory to store all the wave-functions, execution will stop with an error.
See also the related HamiltonianApplyPacked variable.
The default is yes.