Functions

KMeans

supar.utils.fn.kmeans(x, k, max_it=32)[source]

KMeans algorithm for clustering the sentences by length.

Parameters
  • x (list[int]) – The list of sentence lengths.

  • k (int) – The number of clusters. This is an approximate value. The final number of clusters can be less or equal to k.

  • max_it (int) – Maximum number of iterations. If centroids does not converge after several iterations, the algorithm will be early stopped.

Returns

The first list contains average lengths of sentences in each cluster. The second is the list of clusters holding indices of data points.

Return type

list[float], list[list[int]]

Examples

>>> x = torch.randint(10,20,(10,)).tolist()
>>> x
[15, 10, 17, 11, 18, 13, 17, 19, 18, 14]
>>> centroids, clusters = kmeans(x, 3)
>>> centroids
[10.5, 14.0, 17.799999237060547]
>>> clusters
[[1, 3], [0, 5, 9], [2, 4, 6, 7, 8]]

Stripe

supar.utils.fn.stripe(x, n, w, offset=(0, 0), dim=1)[source]

Returns a diagonal stripe of the tensor.

Parameters
  • x (Tensor) – the input tensor with 2 or more dims.

  • n (int) – the length of the stripe.

  • w (int) – the width of the stripe.

  • offset (tuple) – the offset of the first two dims.

  • dim (int) – 1 if returns a horizontal stripe; 0 otherwise.

Returns

a diagonal stripe of the tensor.

Examples

>>> x = torch.arange(25).view(5, 5)
>>> x
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]])
>>> stripe(x, 2, 3)
tensor([[0, 1, 2],
        [6, 7, 8]])
>>> stripe(x, 2, 3, (1, 1))
tensor([[ 6,  7,  8],
        [12, 13, 14]])
>>> stripe(x, 2, 3, (1, 1), 0)
tensor([[ 6, 11, 16],
        [12, 17, 22]])