Functions¶

KMeans¶

supar.utils.fn.kmeans(x, k, max_it=32)[source]¶

KMeans algorithm for clustering the sentences by length.

Parameters

x (list[int]) – The list of sentence lengths.
k (int) – The number of clusters. This is an approximate value. The final number of clusters can be less or equal to k.
max_it (int) – Maximum number of iterations. If centroids does not converge after several iterations, the algorithm will be early stopped.

Returns

The first list contains average lengths of sentences in each cluster. The second is the list of clusters holding indices of data points.

Return type

list[float], list[list[int]]

Examples

>>> x = torch.randint(10,20,(10,)).tolist()
>>> x
[15, 10, 17, 11, 18, 13, 17, 19, 18, 14]
>>> centroids, clusters = kmeans(x, 3)
>>> centroids
[10.5, 14.0, 17.799999237060547]
>>> clusters
[[1, 3], [0, 5, 9], [2, 4, 6, 7, 8]]

Stripe¶

supar.utils.fn.stripe(x, n, w, offset=(0, 0), dim=1)[source]¶

Returns a diagonal stripe of the tensor.