![]()  | 
  
    Octopus
    
   | 
 
Functions/Subroutines | |
| subroutine, public | assign_points_to_centroids_finite_bc (mesh, centroids, ip_to_ic) | 
| Assign each grid point to the closest centroid. A centroid and its set of nearest grid points defines a cluster.  More... | |
| subroutine, public | update_centroids (mesh, weight, ip_to_ic, centroids) | 
| Compute a new set of centroids.  More... | |
| subroutine | compute_grid_difference (points, updated_points, tol, points_differ) | 
| Compute the difference in two grids as \(abs(\mathbf{g}_1 - \mathbf{g}_2)\).  More... | |
| subroutine | report_differences_in_grids (points, updated_points, tol, points_differ) | 
Report differences returned from compute_grid_difference.  More... | |
| subroutine, public | weighted_kmeans (space, mesh, weight, centroids, n_iter, centroid_tol, discretize) | 
| Weighted K-means clustering.  More... | |
| subroutine, public kmeans_clustering_oct_m::assign_points_to_centroids_finite_bc | ( | class(mesh_t), intent(in) | mesh, | 
| real(real64), dimension(:, :), intent(in) | centroids, | ||
| integer, dimension(:), intent(out) | ip_to_ic | ||
| ) | 
Assign each grid point to the closest centroid. A centroid and its set of nearest grid points defines a cluster.
This can be mathematically expressed as:
\[ C_\mu=\left\{Z\left(\mathbf{r}_i\right) \mid \operatorname{dist}\left(Z\left(\mathbf{r}_i\right), Z\left(\mathbf{r}_\mu\right)\right) \leq \operatorname{dist}\left(Z\left(\mathbf{r}_i\right), Z\left(\mathbf{r}_m\right)\right) \text { for all } m \neq \mu\right\} \]
where \( Z(\mathbf{r}_i) \) are data points and \( Z(\mathbf{r}_\mu) \) is the centroid. The distance metric, \(\operatorname{dist}\), is defined as the dot product in this implementation.
See eqs 10-13 in [Complex-valued K-means clustering of interpolative separable density fitting algorithm for large-scale hybrid functional enabled ab initio molecular dynamics simulations within plane waves](https:
| [in] | mesh | Real-space grid (np, spacedim) | 
| [in] | centroids | Centroid positions (spacedim, Ncentroids) | 
| [out] | ip_to_ic | Index array that maps grid indices to centroid indices | 
Definition at line 159 of file kmeans_clustering.F90.
| subroutine, public kmeans_clustering_oct_m::update_centroids | ( | class(mesh_t), intent(in) | mesh, | 
| real(real64), dimension(:), intent(in) | weight, | ||
| integer, dimension(:), intent(in) | ip_to_ic, | ||
| real(real64), dimension(:, :), intent(inout), contiguous | centroids | ||
| ) | 
Compute a new set of centroids.
A centroid is defined as:
\[ \mathbf{r}_\mu = \frac{\sum_{\mathbf{r}_j \in C_\mu} \mathbf{r}_j w(\mathbf{r}_j)}{\sum_{\mathbf{r}_j \in C_\mu} w(\mathbf{r}_j)} \]
where \(\mathbf{r}_j\) and \(w(\mathbf{r}_j\) are grid points and weights restricted to the cluster \( C_\mu \), respectively.
If using domain decomposition, the routine initially computes centroids using only the local set of grid points, then reduces this sum over all domains.
| [in] | mesh | Real-space grid instance | 
| [in] | weight | Weights (meshnp) | 
| [in] | ip_to_ic | Index array that maps grid indices to centroid indices | 
| [in,out] | centroids | In: centroid positions (spacedim, Ncentroids) | 
Definition at line 212 of file kmeans_clustering.F90.
      
  | 
  private | 
Compute the difference in two grids as \(abs(\mathbf{g}_1 - \mathbf{g}_2)\).
If a component of the difference vector for grid point ip is greater than the tolerance, the element points_differ(ip) is set to true.
| [in] | points | Real-space grid (n_dims, N) | 
| [in] | updated_points | Real-space grid (n_dims, N) | 
| [in] | tol | Tolerance | 
| [out] | points_differ | |a_i - b_i| | 
Definition at line 270 of file kmeans_clustering.F90.
      
  | 
  private | 
Report differences returned from compute_grid_difference. 
| [in] | points | Real-space grids (n_dims, N) | 
| [in] | points_differ | If any element of point_i > tol | 
Definition at line 301 of file kmeans_clustering.F90.
| subroutine, public kmeans_clustering_oct_m::weighted_kmeans | ( | class(space_t), intent(in) | space, | 
| class(mesh_t), intent(in) | mesh, | ||
| real(real64), dimension(:), intent(in) | weight, | ||
| real(real64), dimension(:, :), intent(inout), contiguous | centroids, | ||
| integer, intent(in), optional | n_iter, | ||
| real(real64), intent(in), optional | centroid_tol, | ||
| logical, intent(in), optional | discretize | ||
| ) | 
Weighted K-means clustering.
The K-means algorithm divides a set of \(N_r\) samples (in this case, grid points) into \(N_\mu\) disjoin clusters \(C\). The mean of each cluster defines the cluster centroid. Note that centroids are not, in general, points from the discrete grid
Given a grid, and some initial guess at centroids, the K-means algorithm aims to choose centroids that minimise the inertia, or within-cluster sum-of-squares criterion:
\[ argmin \sum^{N_\mu}_{\mu=1} \sum_{\mathbf{r}_\mathbf{k} \in C_\mu} || Z(\mathbf{r}_k) - Z(\mathbf{r}_\mu)||^2 \]
This implementation is based on Algorithm 1. given in [Complex-valued K-means clustering of interpolative separable density fitting algorithm for large-scale hybrid functional enabled ab initio molecular dynamics simulations within plane waves](https: however it is equivalent to implementations found in packages such as [scikit-learn](https:
The K-means algorithm consists of looping between two steps:
assign_points_to_centroids_finite_bcupdate_centroids The difference between the old and the new centroids are computed, and the algorithm repeats these last two steps until this value is less than a threshold.| [in] | space | Spatial dimensions and periodic dimensions | 
| [in] | mesh | Real-space grid instance | 
| [in] | weight | Weights (n_points) | 
| [in,out] | centroids | In: Initial centroids (n_dim, n_centroid) | 
| [in] | n_iter | Optional max number of iterations | 
| [in] | centroid_tol | Optional convergence criterion | 
| [in] | discretize | Optional Discretize centroid values to grid points | 
Definition at line 358 of file kmeans_clustering.F90.