

supar.utils.fn.kmeans(x: List[int], k: int, max_it: int = 32) Tuple[List[float], List[List[int]]][source]#

KMeans algorithm for clustering the sentences by length.

  • x (List[int]) – The list of sentence lengths.

  • k (int) – The number of clusters, which is an approximate value. The final number of clusters can be less or equal to k.

  • max_it (int) – Maximum number of iterations. If centroids does not converge after several iterations, the algorithm will be early stopped.


The first list contains average lengths of sentences in each cluster. The second is the list of clusters holding indices of data points.

Return type

List[float], List[List[int]]


>>> x = torch.randint(10, 20, (10,)).tolist()
>>> x
[15, 10, 17, 11, 18, 13, 17, 19, 18, 14]
>>> centroids, clusters = kmeans(x, 3)
>>> centroids
[10.5, 14.0, 17.799999237060547]
>>> clusters
[[1, 3], [0, 5, 9], [2, 4, 6, 7, 8]]


supar.utils.fn.stripe(x: torch.Tensor, n: int, w: int, offset: Tuple = (0, 0), horizontal: bool = True) torch.Tensor[source]#

Returns a parallelogram stripe of the tensor.

  • x (Tensor) – the input tensor with 2 or more dims.

  • n (int) – the length of the stripe.

  • w (int) – the width of the stripe.

  • offset (tuple) – the offset of the first two dims.

  • horizontal (bool) – True if returns a horizontal stripe; False otherwise.


A parallelogram stripe of the tensor.


>>> x = torch.arange(25).view(5, 5)
>>> x
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]])
>>> stripe(x, 2, 3)
tensor([[0, 1, 2],
        [6, 7, 8]])
>>> stripe(x, 2, 3, (1, 1))
tensor([[ 6,  7,  8],
        [12, 13, 14]])
>>> stripe(x, 2, 3, (1, 1), 0)
tensor([[ 6, 11, 16],
        [12, 17, 22]])