Merge Sorted Arrays [Solution]

Solution 1: Heap

To solve the “Merge Sorted Arrays” problem, we can use a min heap to efficiently merge the sorted arrays. We will maintain a heap of size k where each element in the heap is a tuple (value, array_index, element_index) representing the current element and its position in the array.

import heapq
from typing import List

def merge_sorted_arrays(arrays: List[List[int]]) -> List[int]:
    # The merged sorted array
    result = []
    # Min heap to keep track of the smallest element from each array
    heap = []

    # Initialize the heap with the first element from each array
    for i, array in enumerate(arrays):
        if array:
            heapq.heappush(heap, (array[0], i, 0))

    while heap:
    	# Pop the smallest element from the heap
        val, array_idx, elem_idx = heapq.heappop(heap)
        # Add the smallest element to the result
        result.append(val)

        # Move to the next element in the same array
        if elem_idx + 1 < len(arrays[array_idx]):
            heapq.heappush(heap, (arrays[array_idx][elem_idx + 1], array_idx, elem_idx + 1))

    return result

# Example usage:
arrays = [
    [1, 3, 5],
    [2, 4, 6],
    [0, 7, 8]
]

result = merge_sorted_arrays(arrays)
print(result)
assert result == [0, 1, 2, 3, 4, 5, 6, 7, 8]

Explanation:

We use a min heap to keep track of the smallest element from each array.
The heap is initialized with the first element from each array along with their array index and element index.
In each iteration, we pop the smallest element from the heap, add it to the result, and push the next element from the same array to the heap if available.
This process continues until the heap is empty, and the result is a merged sorted array.

Time Complexity:

The time complexity is O(N log k), where N is the total number of elements across all arrays, and k is the number of arrays. We perform at most N heap operations, and each operation takes log k time.

Here is the breakdown:

Initializing the Heap:
- We initialize the heap with the first element from each array. This operation takes O(k * log k) time, where k is the number of arrays.
- For each array, we perform a heappush operation, which takes log k time. We do this for all k arrays.
Heap Operations in the While Loop:
- In each iteration of the while loop, we perform a heappop operation, which takes log k time.
- If the popped element has a next element in its array, we perform a heappush operation for that next element. Again, this takes log k time.
Total Operations:
- The while loop runs until the heap is empty, and in each iteration, we perform log k operations.
- In the worst case, we might need to process all N elements (across all arrays), and each element is pushed and popped once.
- Therefore, the total number of heap operations is O(N log k).
Overall Time Complexity:
- The overall time complexity is O(k * log k + N * log k), where k is the number of arrays and N is the total number of elements.
- This complexity arises because the dominant factor is the merging of N elements across all arrays, and each operation involving the heap takes log k time.
- Note: The base of the logarithm is typically omitted in Big O notation, so log k generally refers to log base 2.

Space Complexity:

The space complexity is O(k), where k is the number of arrays. The heap can have at most k elements at any given time.

Solution 2: Priority Queue

We can also achieve similar time complexity of O(N log k) without using a heap by using a different approach. We can use a priority queue to keep track of the current smallest element from each array. We’ll maintain the priority queue based on the first element of each array and pop the smallest element at each step.

Here’s the modified implementation:

from queue import PriorityQueue
from typing import List

def merge_sorted_arrays(arrays: List[List[int]]) -> List[int]:
    # Resulting merged array
    result = []

    # Priority queue to keep track of the smallest element from each array
    priority_queue = PriorityQueue()

    # Initialize the priority queue with the first element from each array
    for i, array in enumerate(arrays):
        if array:
            priority_queue.put((array[0], i, 0))

    # Continue until the priority queue is empty
    while not priority_queue.empty():
        # Pop the smallest element from the priority queue
        val, array_idx, elem_idx = priority_queue.get()

        # Add the popped element to the result
        result.append(val)

        # Move to the next element in the same array
        if elem_idx + 1 < len(arrays[array_idx]):
            priority_queue.put((arrays[array_idx][elem_idx + 1], array_idx, elem_idx + 1))

    return result

# Example usage:
arrays = [
    [1, 3, 5],
    [2, 4, 6],
    [0, 7, 8]
]

result = merge_sorted_arrays(arrays)
print(result)
assert result == [0, 1, 2, 3, 4, 5, 6, 7, 8]

Explanation:

The result list is used to store the merged sorted array.
The priority_queue is a priority queue (implemented using PriorityQueue from the queue module) to keep track of the smallest element from each array along with its array index and element index.
The priority queue is initialized with the first element from each array.
The loop continues until the priority queue is empty, and in each iteration, the smallest element is popped from the priority queue, added to the result, and the next element from the same array is pushed to the priority queue if available.

Time Complexity:

The time complexity of the provided solution is O(N log k), where N is the total number of elements across all arrays, and k is the number of arrays.

PriorityQueue Operations: Inserting an element into the priority queue or extracting the minimum element takes O(log k) time, where k is the number of arrays.
Number of Elements: We perform these operations for each element in the resulting merged array. Therefore, for N total elements, the time complexity becomes O(N log k).

Space Complexity:

The space complexity is O(k), where k is the number of arrays. This accounts for the space used by the priority queue, which can have at most k elements at any given time.

PriorityQueue Space: The space required for the priority queue is proportional to the number of arrays (k) since we only store one element from each array at any given time.
Result Array: The space required for the result array is O(N), where N is the total number of elements across all arrays.

Overall, the dominating factor for space complexity is the priority queue, leading to a space complexity of O(k).

Solution 3: O(`1`) Space Complexity

We can solve the problem above with O(1) space complexity without using a heap or priority queue. Instead, we can iterate through all the arrays simultaneously and pick the smallest element at each step.

Here’s the implementation:

from typing import List

def merge_sorted_arrays(arrays: List[List[int]]) -> List[int]:
    # To store the merged sorted array
    result = []
    # To keep track of the current index for each array
    indices = [0] * len(arrays)

    # Continue until there are elements left in any of the arrays
    while any(indices[i] < len(arrays[i]) for i in range(len(arrays))):
        # Initialize the minimum value to positive infinity
        min_val = float('inf')
        # Initialize the index of the array with the minimum value
        min_idx = -1

        # Iterate through each array and find the minimum value
        for i in range(len(arrays)):
            # Check if there are elements left in the current array
            if indices[i] < len(arrays[i]) and arrays[i][indices[i]] < min_val:
                min_val = arrays[i][indices[i]]
                min_idx = i

        # Append the minimum value to the result
        result.append(min_val)
        # Increment the index of the array from which the minimum value was picked
        indices[min_idx] += 1

    return result

# Example usage:
arrays = [
    [1, 3, 5],
    [2, 4, 6],
    [0, 7, 8]
]

result = merge_sorted_arrays(arrays)
print(result)
assert result == [0, 1, 2, 3, 4, 5, 6, 7, 8]

Explanation:

We use an indices list to keep track of the current index for each array.
In each iteration, we find the minimum value among the elements at the current indices of all arrays.
We append the minimum value to the result and increment the index of the array from which we picked the element.
We continue this process until all arrays are exhausted.

Time Complexity:

The time complexity is O(N * k), where N is the total number of elements across all arrays, and k is the number of arrays. In each iteration, we perform a constant number of comparisons for each array.

Space Complexity:

The space complexity is O(1) as we do not use any additional data structures besides the input and output arrays.

Solution 4: Divide and Conquer, O(`1`) Space and O(`N log k`) Time Complexity

We can achieve O(1) space complexity and O(N log k) time complexity by using a modified divide and conquer approach. We’ll repeatedly merge pairs of sorted arrays until we have a single merged array.

Here’s the implementation:

from typing import List

def merge_sorted_arrays(arrays: List[List[int]]) -> List[int]:
    # Check if the list of arrays is empty
    if not arrays:
        return []

    # Define a helper function to merge two sorted arrays
    def merge_two_arrays(arr1: List[int], arr2: List[int]) -> List[int]:
        result = []
        i = j = 0

        # Merge elements from arr1 and arr2 in sorted order
        while i < len(arr1) and j < len(arr2):
            if arr1[i] < arr2[j]:
                result.append(arr1[i])
                i += 1
            else:
                result.append(arr2[j])
                j += 1

        # Append remaining elements, if any
        result.extend(arr1[i:])
        result.extend(arr2[j:])
        return result

    # Repeatedly merge pairs of arrays until only one array remains
    while len(arrays) > 1:
        merged_arrays = []

        # Merge pairs of arrays
        for i in range(0, len(arrays), 2):
            if i + 1 < len(arrays):
                merged = merge_two_arrays(arrays[i], arrays[i + 1])
            else:
                # If there is an odd number of arrays, keep the last array as is
                merged = arrays[i]
            merged_arrays.append(merged)

        arrays = merged_arrays

    # The final result is the merged array
    return arrays[0]

# Example usage:
arrays = [
    [1, 3, 5],
    [2, 4, 6],
    [0, 7, 8]
]

result = merge_sorted_arrays(arrays)
# Merged Sorted Array: [0, 1, 2, 3, 4, 5, 6, 7, 8]
assert result == [0, 1, 2, 3, 4, 5, 6, 7, 8]
print(result)

Explanation:

The merge_two_arrays function merges two sorted arrays into a single sorted array.
The main function repeatedly merges pairs of arrays until only one array remains.

Time Complexity:

The time complexity is O(N log k), where N is the total number of elements across all arrays, and k is the number of arrays. In each iteration, we perform O(N) comparisons, and we repeat this process log k times.

Here is the breakdown:

Merging Pairs of Arrays:
- In each iteration of the main loop, we merge pairs of sorted arrays. The number of comparisons made during this merging step is proportional to the total number of elements across all arrays, denoted as N.
- Therefore, the time complexity for merging pairs of arrays is O(N).
Number of Iterations:
- In each iteration, the number of remaining arrays is reduced by half. The iterations continue until only one array remains.
- The number of iterations is log k, where k is the number of arrays.
Overall Time Complexity:
- The overall time complexity is O(N log k), where N is the total number of elements across all arrays, and k is the number of arrays.
- This is because we perform O(N) work in each of the log k iterations.

This divide and conquer approach efficiently merges sorted arrays, and the time complexity is kept within O(N log k).

Space Complexity:

The space complexity is O(1) as we are not using any additional data structures except for a few variables.

Solution 1: Heap

Solution 2: Priority Queue

Solution 3: O(1) Space Complexity

Solution 4: Divide and Conquer, O(1) Space and O(N log k) Time Complexity

Solution 3: O(`1`) Space Complexity

Solution 4: Divide and Conquer, O(`1`) Space and O(`N log k`) Time Complexity