Octopus
|
Functions/Subroutines | |
subroutine, public | assign_points_to_centroids_finite_bc (mesh, centroids, ip_to_ic) |
Assign each grid point to the closest centroid. A centroid and its set of nearest grid points defines a cluster. More... | |
subroutine, public | update_centroids (mesh, weight, ip_to_ic, centroids) |
Compute a new set of centroids. More... | |
subroutine | compute_grid_difference (points, updated_points, tol, points_differ) |
Compute the difference in two grids as \(abs(\mathbf{g}_1 - \mathbf{g}_2)\). More... | |
subroutine | report_differences_in_grids (points, updated_points, tol, points_differ) |
Report differences returned from compute_grid_difference . More... | |
subroutine, public | weighted_kmeans (space, mesh, weight, centroids, n_iter, centroid_tol, discretize) |
Weighted K-means clustering. More... | |
subroutine, public kmeans_clustering_oct_m::assign_points_to_centroids_finite_bc | ( | class(mesh_t), intent(in) | mesh, |
real(real64), dimension(:, :), intent(in) | centroids, | ||
integer, dimension(:), intent(out) | ip_to_ic | ||
) |
Assign each grid point to the closest centroid. A centroid and its set of nearest grid points defines a cluster.
This can be mathematically expressed as:
\[ C_\mu=\left\{Z\left(\mathbf{r}_i\right) \mid \operatorname{dist}\left(Z\left(\mathbf{r}_i\right), Z\left(\mathbf{r}_\mu\right)\right) \leq \operatorname{dist}\left(Z\left(\mathbf{r}_i\right), Z\left(\mathbf{r}_m\right)\right) \text { for all } m \neq \mu\right\} \]
where \( Z(\mathbf{r}_i) \) are data points and \( Z(\mathbf{r}_\mu) \) is the centroid. The distance metric, \(\operatorname{dist}\), is defined as the dot product in this implementation.
See eqs 10-13 in [Complex-valued K-means clustering of interpolative separable density fitting algorithm for large-scale hybrid functional enabled ab initio molecular dynamics simulations within plane waves](https:
[in] | mesh | Real-space grid (np, spacedim) |
[in] | centroids | Centroid positions (spacedim, Ncentroids) |
[out] | ip_to_ic | Index array that maps grid indices to centroid indices |
Definition at line 159 of file kmeans_clustering.F90.
subroutine, public kmeans_clustering_oct_m::update_centroids | ( | class(mesh_t), intent(in) | mesh, |
real(real64), dimension(:), intent(in) | weight, | ||
integer, dimension(:), intent(in) | ip_to_ic, | ||
real(real64), dimension(:, :), intent(inout), contiguous | centroids | ||
) |
Compute a new set of centroids.
A centroid is defined as:
\[ \mathbf{r}_\mu = \frac{\sum_{\mathbf{r}_j \in C_\mu} \mathbf{r}_j w(\mathbf{r}_j)}{\sum_{\mathbf{r}_j \in C_\mu} w(\mathbf{r}_j)} \]
where \(\mathbf{r}_j\) and \(w(\mathbf{r}_j\) are grid points and weights restricted to the cluster \( C_\mu \), respectively.
If using domain decomposition, the routine initially computes centroids using only the local set of grid points, then reduces this sum over all domains.
[in] | mesh | Real-space grid instance |
[in] | weight | Weights (meshnp) |
[in] | ip_to_ic | Index array that maps grid indices to centroid indices |
[in,out] | centroids | In: centroid positions (spacedim, Ncentroids) |
Definition at line 212 of file kmeans_clustering.F90.
|
private |
Compute the difference in two grids as \(abs(\mathbf{g}_1 - \mathbf{g}_2)\).
If a component of the difference vector for grid point ip is greater than the tolerance, the element points_differ(ip) is set to true.
[in] | points | Real-space grid (n_dims, N) |
[in] | updated_points | Real-space grid (n_dims, N) |
[in] | tol | Tolerance |
[out] | points_differ | |a_i - b_i| |
Definition at line 270 of file kmeans_clustering.F90.
|
private |
Report differences returned from compute_grid_difference
.
[in] | points | Real-space grids (n_dims, N) |
[in] | points_differ | If any element of point_i > tol |
Definition at line 301 of file kmeans_clustering.F90.
subroutine, public kmeans_clustering_oct_m::weighted_kmeans | ( | class(space_t), intent(in) | space, |
class(mesh_t), intent(in) | mesh, | ||
real(real64), dimension(:), intent(in) | weight, | ||
real(real64), dimension(:, :), intent(inout), contiguous | centroids, | ||
integer, intent(in), optional | n_iter, | ||
real(real64), intent(in), optional | centroid_tol, | ||
logical, intent(in), optional | discretize | ||
) |
Weighted K-means clustering.
The K-means algorithm divides a set of \(N_r\) samples (in this case, grid points) into \(N_\mu\) disjoin clusters \(C\). The mean of each cluster defines the cluster centroid. Note that centroids are not, in general, points from the discrete grid
Given a grid, and some initial guess at centroids, the K-means algorithm aims to choose centroids that minimise the inertia, or within-cluster sum-of-squares criterion:
\[ argmin \sum^{N_\mu}_{\mu=1} \sum_{\mathbf{r}_\mathbf{k} \in C_\mu} || Z(\mathbf{r}_k) - Z(\mathbf{r}_\mu)||^2 \]
This implementation is based on Algorithm 1. given in [Complex-valued K-means clustering of interpolative separable density fitting algorithm for large-scale hybrid functional enabled ab initio molecular dynamics simulations within plane waves](https: however it is equivalent to implementations found in packages such as [scikit-learn](https:
The K-means algorithm consists of looping between two steps:
assign_points_to_centroids_finite_bc
update_centroids
The difference between the old and the new centroids are computed, and the algorithm repeats these last two steps until this value is less than a threshold.[in] | space | Spatial dimensions and periodic dimensions |
[in] | mesh | Real-space grid instance |
[in] | weight | Weights (n_points) |
[in,out] | centroids | In: Initial centroids (n_dim, n_centroid) |
[in] | n_iter | Optional max number of iterations |
[in] | centroid_tol | Optional convergence criterion |
[in] | discretize | Optional Discretize centroid values to grid points |
Definition at line 365 of file kmeans_clustering.F90.