Here’s a Python implementation of the kth_smallest function using a min-heap:

import heapq
from typing import List

def kth_smallest(matrix: List[List[int]], k: int) -> int:
    rows, cols = len(matrix), len(matrix[0])
    
    # Create a min-heap to store the elements along with their coordinates (value, row, col)
    min_heap = [(matrix[0][j], 0, j) for j in range(cols)]
    heapq.heapify(min_heap)

    # Perform k-1 pop operations to reach the kth smallest element
    for _ in range(k - 1):
        val, row, col = heapq.heappop(min_heap)
        # If there is a next row, add the element from the next row
        if row + 1 < rows:
            heapq.heappush(min_heap, (matrix[row + 1][col], row + 1, col))

    # The kth smallest element is at the top of the heap
    return heapq.heappop(min_heap)[0]

# Example usage:
matrix1 = [
    [1, 5, 9],
    [10, 11, 13],
    [12, 13, 15]
]
result1 = kth_smallest(matrix1, 8)
print(result1)  # Output: 13

matrix2 = [
    [1, 3, 5],
    [6, 7, 12],
    [11, 14, 14]
]
result2 = kth_smallest(matrix2, 2)
print(result2)  # Output: 3

matrix3 = [
    [ 1,  5,  9],
    [ 2,  6, 10],
    [ 4,  7, 11]
]
result3 = kth_smallest(matrix3, 3)
print(result3)  # Output: 4

This solution uses a min-heap to keep track of the smallest elements from each column. We start with the first row, and for each pop operation, we add the element from the next row in the same column to the heap. After k - 1 such operations, the kth smallest element is at the top of the heap. The time complexity is O(k * log(cols)), where cols is the number of columns, and the space complexity is O(cols).

More detailed explanation on the time complexity analysis:

Let’s break down the time complexity analysis:

  • We start by creating a min-heap with elements from the first row, which takes O(cols * log(cols)) time. This is because we insert each element individually into the heap, and each insertion takes O(log(cols)) time.
  • We perform k - 1 pop operations from the heap. In each pop operation, we pop the smallest element from the heap and add the element from the next row in the same column. Popping and pushing elements into the heap take O(log(cols)) time each.
  • Since we perform k - 1 pop operations, the total time complexity for popping and pushing elements is O((k-1) * log(cols)).

Therefore, the overall time complexity is O(cols * log(cols) + (k-1) * log(cols)). In the worst case, when k is close to n^2 (the total number of elements in the matrix), the dominating factor is the heap operations, and the time complexity can be approximated as O(k * log(cols)).

It’s worth noting that if k is much smaller than n^2, the time complexity would be closer to O(cols * log(cols)), but we include the (k-1) factor to provide a more general analysis that considers the worst case.