Octopus
|
This module implements batches of mesh functions. More...
This module implements batches of mesh functions.
In many situations, we need to perform the same operations over many mesh functions, such as the electronic wave functions. It is therefore advantageous to group those functions into one object. This can ensure that different mesh functions are contiguous in memory.
Due to the nature of stencil operations, which constitute a large part of the low level operations on mesh functions, it is often more efficient to perform the same stencil operation over different mesh functions (i.e. using the state index as fast index), than looping first over the mesh index, which would, in general, require a different stencil for each mesh point. This is, in particular, the case for calculations utilizing GPUs.
Therefore, we store mesh functions in linear or in so-called packed form. The former refers to the natural ordering where the mesh index is the fastest moving, while the latter is transposed. Furthermore the arrays are padded to ensure aligned memory access.
The packed form is even more advantageous on the GPU. Therefore, only packed data is stored in the device memory. On the devices, the padding is aligned with the size of a work group and can depend on the actual device.
Data Types | |
interface | batch_init |
initialize a batch with existing memory More... | |
type | batch_t |
Class defining batches of mesh functions. More... | |
Functions/Subroutines | |
subroutine | batch_end (this, copy) |
finalize a batch and release allocated memory, if necessary More... | |
subroutine | batch_deallocate_unpacked_host (this) |
release unpacked host memory More... | |
subroutine | batch_deallocate_packed_host (this) |
release packed host memory More... | |
subroutine | batch_deallocate_packed_device (this) |
release packed device memory More... | |
subroutine | batch_allocate_unpacked_host (this) |
allocate host (CPU) memory for unpacked data More... | |
subroutine | batch_allocate_packed_host (this) |
allocate host (CPU) memory for packed data More... | |
subroutine | batch_allocate_packed_device (this) |
allocate device (GPU) memory for packed data More... | |
subroutine | batch_init_empty (this, dim, nst, np) |
initialize an empty batch More... | |
subroutine | batch_clone_to (this, dest, pack, copy_data, new_np) |
clone a batch to a new batch More... | |
subroutine | batch_clone_to_array (this, dest, n_batches, pack, copy_data) |
subroutine | batch_copy_to (this, dest, pack, copy_data, new_np, special) |
make a copy of a batch More... | |
type(type_t) pure function | batch_type (this) |
return the type of a batch More... | |
integer pure function | batch_type_as_integer (this) |
For debuging purpose only. More... | |
integer pure function | batch_status (this) |
return the status of a batch More... | |
logical pure function | batch_is_packed (this) |
integer(int64) function | batch_pack_total_size (this) |
subroutine | batch_do_pack (this, copy, async) |
pack the data in a batch More... | |
subroutine | batch_do_unpack (this, copy, force, async) |
unpack a batch More... | |
subroutine | batch_finish_unpack (this) |
finish the unpacking if do_unpack() was called with async=.true. More... | |
subroutine | batch_write_unpacked_to_device (this) |
subroutine | batch_read_device_to_unpacked (this) |
subroutine | batch_write_packed_to_device (this, async) |
subroutine, public | batch_read_device_to_packed (this, async) |
integer function | batch_inv_index (this, cind) |
inverse index lookup More... | |
integer pure function | batch_ist_idim_to_linear (this, cind) |
direct index lookup More... | |
integer pure function | batch_linear_to_ist (this, linear_index) |
get state index ist from linear (combined dim and nst) index More... | |
integer pure function | batch_linear_to_idim (this, linear_index) |
extract idim from linear index More... | |
subroutine | batch_remote_access_start (this, mpi_grp, rma_win) |
start remote access to a batch on another node More... | |
subroutine | batch_remote_access_stop (this, rma_win) |
stop the remote access to the batch More... | |
subroutine | batch_copy_data_to (this, np, dest, async) |
copy data to another batch. More... | |
subroutine | batch_check_compatibility_with (this, target, only_check_dim) |
check whether two batches have compatible dimensions (and type) More... | |
subroutine | batch_build_indices (this, st_start, st_end) |
build the index ist(:) and ist_idim_index(:,:) and set pack_size More... | |
subroutine | dbatch_init_with_memory_3 (this, dim, st_start, st_end, psi) |
initialize a batch with an rank-3 array of TYPE_FLOAT valued mesh functions psi. More... | |
subroutine | dbatch_init_with_memory_2 (this, dim, st_start, st_end, psi) |
initialize a batch with an rank-2 array of TYPE_FLOAT valued mesh functions psi. More... | |
subroutine | dbatch_init_with_memory_1 (this, psi) |
initialize a batch with an rank-1 array of TYPE_FLOAT valued mesh functions psi. More... | |
subroutine | dbatch_allocate_unpacked_host (this) |
allocate host (CPU) memory for unpacked data of type TYPE_FLOAT More... | |
subroutine | dbatch_allocate_packed_host (this) |
allocate host (CPU) memory for packed data of type TYPE_FLOAT More... | |
subroutine, public | dbatch_init (this, dim, st_start, st_end, np, special, packed) |
initialize a TYPE_FLOAT valued batch to given size without providing external memory More... | |
subroutine | dbatch_pack_copy (this) |
copy data from the unpacked to the packed arrays More... | |
subroutine | dbatch_unpack_copy (this) |
copy data from the packed to the unpacked arrays More... | |
subroutine | zbatch_init_with_memory_3 (this, dim, st_start, st_end, psi) |
initialize a batch with an rank-3 array of TYPE_CMPLX valued mesh functions psi. More... | |
subroutine | zbatch_init_with_memory_2 (this, dim, st_start, st_end, psi) |
initialize a batch with an rank-2 array of TYPE_CMPLX valued mesh functions psi. More... | |
subroutine | zbatch_init_with_memory_1 (this, psi) |
initialize a batch with an rank-1 array of TYPE_CMPLX valued mesh functions psi. More... | |
subroutine | zbatch_allocate_unpacked_host (this) |
allocate host (CPU) memory for unpacked data of type TYPE_CMPLX More... | |
subroutine | zbatch_allocate_packed_host (this) |
allocate host (CPU) memory for packed data of type TYPE_CMPLX More... | |
subroutine, public | zbatch_init (this, dim, st_start, st_end, np, special, packed) |
initialize a TYPE_CMPLX valued batch to given size without providing external memory More... | |
subroutine | zbatch_pack_copy (this) |
copy data from the unpacked to the packed arrays More... | |
subroutine | zbatch_unpack_copy (this) |
copy data from the packed to the unpacked arrays More... | |
Variables | |
integer, parameter, public | batch_not_packed = 0 |
functions are stored in CPU memory, unpacked order More... | |
integer, parameter, public | batch_packed = 1 |
functions are stored in CPU memory, in transposed (packed) order More... | |
integer, parameter, public | batch_device_packed = 2 |
functions are stored in device memory in packed order More... | |
integer, parameter | cl_pack_max_buffer_size = 4 |
this value controls the size (in number of wave-functions) of the buffer used to copy states to the opencl device. More... | |
|
private |
finalize a batch and release allocated memory, if necessary
If the batch was initialized with 'external' memory, this routine ensures that this memory is up-to-date, when the batch is finalized. This means, that the data is copied from the device (If requested) and unpacked.
[in] | copy | do we need to copy data from the device? Default = .true. (from batch_oct_m::batch_do_uppack) |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
initialize an empty batch
This auxilliary function is only called from batch_oct_m::batch_init functions. It initializes the book-keeping parameters, allocates memory for the indices and nullifies the pointers to the data arrays.
[out] | this | the batch to be initialized |
[in] | dim | The number of spin dimensions |
[in] | nst | The number of states in the batch |
[in] | np | The number of points in each mesh function |
|
private |
clone a batch to a new batch
This routine clones the metadata of a batch and, if requested copies the data.
[in] | this | source batch |
[out] | dest | destination batch |
[in] | pack | If .false. the new batch will not be packed. Default: batch_is_packed(this) |
[in] | copy_data | If .true. the batch data will be copied to the destination batch. Default: .false. |
[in] | new_np | If present, this replaces thisnp in the initialization |
|
private |
make a copy of a batch
This routine can perform a deep or a shallow copy of a batch
[in] | this | The source batch |
[out] | dest | The destination batch |
[in] | pack | If .false. the new batch will not be packed. Default: batch_is_packed(this) |
[in] | copy_data | If .true. the batch data will be copied to the destination batch. Default: .false. |
[in] | new_np | If present, this replaces thisnp in the initialization |
[in] | special | If present, this replace special in the locic below, i.e., we try to allocate on the GPU |
|
private |
|
private |
|
private |
|
private |
|
private |
pack the data in a batch
If accelerators are enabled, the packed data is moved to the device memory. If the batch is already packed, a counter is increased to keep track when to unpack.
[in,out] | this | The current batch |
[in] | copy | Do we copy the data to the packed memory? (default .true.) |
[in] | async | We can do an asynchronous operation. (default .false.) The program flow can continue while data is being transferred to the device. |
|
private |
unpack a batch
We unpack the batch if the 'packing counter' is one, or the force flag is given.
[in] | copy | indicate whether to copy the data (default .true.) |
[in] | force | if force = .true., unpack independently of the counter (default .false.) |
[in] | async | indicate whether the operation can by asynchronous (default .false.). In this case the operation has to be completed by calling batch_finish_unpack() |
|
private |
|
private |
|
private |
|
private |
subroutine, public batch_oct_m::batch_read_device_to_packed | ( | class(batch_t), intent(inout) | this, |
logical, intent(in), optional | async | ||
) |
|
private |
|
private |
|
private |
|
private |
|
private |
start remote access to a batch on another node
This routine creates a remote access window for a given batch and returns a handle to that window. A handle of -1 indicates that no window was created.
[in,out] | this | the current batch |
[in] | mpi_grp | the MPI group |
[out] | rma_win | handle of rma window |
|
private |
|
private |
|
private |
|
private |
initialize a batch with an rank-2 array of TYPE_FLOAT valued mesh functions psi.
The TYPE_FLOAT valued mesh functions psi are expected to be either of the following dimensions:
where np_batch can be either np or np_part.
dim==1
or st_start==st_end
has to be fulfilled.
|
private |
initialize a batch with an rank-1 array of TYPE_FLOAT valued mesh functions psi.
The TYPE_FLOAT valued mesh functions psi are expected to be of dimensions (1:np_batch) where np_batch can be either np or np_part.
idim=1
and s_start=st_end=1
.
|
private |
|
private |
subroutine, public batch_oct_m::dbatch_init | ( | class(batch_t), intent(inout) | this, |
integer, intent(in) | dim, | ||
integer, intent(in) | st_start, | ||
integer, intent(in) | st_end, | ||
integer, intent(in) | np, | ||
logical, intent(in), optional | special, | ||
logical, intent(in), optional | packed | ||
) |
initialize a TYPE_FLOAT valued batch to given size without providing external memory
[in,out] | this | the batch to initialize |
[in] | dim | Spinor dimension of the state (one, or two for spinors) |
[in] | st_start | index of first state of the batch |
[in] | st_end | index of last state of the batch |
[in] | np | number of points in each function (this can be np or np_part) |
[in] | special | If .true., the allocation will be handled in C (to use pinned memory for GPUs). Default = .false. |
[in] | packed | If .true. the batch will be initialized in packed form. Default = .false. |
|
private |
|
private |
|
private |
|
private |
initialize a batch with an rank-2 array of TYPE_CMPLX valued mesh functions psi.
The TYPE_CMPLX valued mesh functions psi are expected to be either of the following dimensions:
where np_batch can be either np or np_part.
dim==1
or st_start==st_end
has to be fulfilled.
|
private |
initialize a batch with an rank-1 array of TYPE_CMPLX valued mesh functions psi.
The TYPE_CMPLX valued mesh functions psi are expected to be of dimensions (1:np_batch) where np_batch can be either np or np_part.
idim=1
and s_start=st_end=1
.
|
private |
|
private |
subroutine, public batch_oct_m::zbatch_init | ( | class(batch_t), intent(inout) | this, |
integer, intent(in) | dim, | ||
integer, intent(in) | st_start, | ||
integer, intent(in) | st_end, | ||
integer, intent(in) | np, | ||
logical, intent(in), optional | special, | ||
logical, intent(in), optional | packed | ||
) |
initialize a TYPE_CMPLX valued batch to given size without providing external memory
[in,out] | this | the batch to initialize |
[in] | dim | Spinor dimension of the state (one, or two for spinors) |
[in] | st_start | index of first state of the batch |
[in] | st_end | index of last state of the batch |
[in] | np | number of points in each function (this can be np or np_part) |
[in] | special | If .true., the allocation will be handled in C (to use pinned memory for GPUs). Default = .false. |
[in] | packed | If .true. the batch will be initialized in packed form. Default = .false. |
|
private |
|
private |
integer, parameter, public batch_oct_m::batch_not_packed = 0 |
integer, parameter, public batch_oct_m::batch_packed = 1 |
integer, parameter, public batch_oct_m::batch_device_packed = 2 |