Merge Sorted Arrays [Solution]
Solution 1: Heap
To solve the “Merge Sorted Arrays” problem, we can use a min heap to efficiently merge the sorted arrays.
We will maintain a heap of size k
where each element in the heap is a tuple
(value, array_index, element_index)
representing the current element and its position in the array.
import heapq
from typing import List
def merge_sorted_arrays(arrays: List[List[int]]) -> List[int]:
# The merged sorted array
result = []
# Min heap to keep track of the smallest element from each array
heap = []
# Initialize the heap with the first element from each array
for i, array in enumerate(arrays):
if array:
heapq.heappush(heap, (array[0], i, 0))
while heap:
# Pop the smallest element from the heap
val, array_idx, elem_idx = heapq.heappop(heap)
# Add the smallest element to the result
result.append(val)
# Move to the next element in the same array
if elem_idx + 1 < len(arrays[array_idx]):
heapq.heappush(heap, (arrays[array_idx][elem_idx + 1], array_idx, elem_idx + 1))
return result
# Example usage:
arrays = [
[1, 3, 5],
[2, 4, 6],
[0, 7, 8]
]
result = merge_sorted_arrays(arrays)
print(result)
assert result == [0, 1, 2, 3, 4, 5, 6, 7, 8]
Explanation:
- We use a min heap to keep track of the smallest element from each array.
- The heap is initialized with the first element from each array along with their array index and element index.
- In each iteration, we pop the smallest element from the heap, add it to the result, and push the next element from the same array to the heap if available.
- This process continues until the heap is empty, and the result is a merged sorted array.
Time Complexity:
The time complexity is O(N log k
), where N
is the total number of elements across all arrays, and k
is the number of arrays. We perform at most N
heap operations, and each operation takes log k
time.
Here is the breakdown:
- Initializing the Heap:
- We initialize the heap with the first element from each array. This operation takes O(
k * log k
) time, wherek
is the number of arrays. - For each array, we perform a
heappush
operation, which takeslog k
time. We do this for allk
arrays.
- We initialize the heap with the first element from each array. This operation takes O(
- Heap Operations in the While Loop:
- In each iteration of the
while
loop, we perform aheappop
operation, which takeslog k
time. - If the popped element has a next element in its array, we perform a
heappush
operation for that next element. Again, this takeslog k
time.
- In each iteration of the
- Total Operations:
- The
while
loop runs until the heap is empty, and in each iteration, we performlog k
operations. - In the worst case, we might need to process all
N
elements (across all arrays), and each element is pushed and popped once. - Therefore, the total number of heap operations is O(
N log k
).
- The
- Overall Time Complexity:
- The overall time complexity is O(
k * log k + N * log k
), wherek
is the number of arrays andN
is the total number of elements. - This complexity arises because the dominant factor is the merging of
N
elements across all arrays, and each operation involving the heap takeslog k
time. - Note: The base of the logarithm is typically omitted in Big O notation, so
log k
generally refers tolog
base2
.
- The overall time complexity is O(
Space Complexity:
The space complexity is O(k
), where k
is the number of arrays.
The heap can have at most k
elements at any given time.
Solution 2: Priority Queue
We can also achieve similar time complexity of O(N log k
) without using a heap by using a different approach.
We can use a priority queue to keep track of the current smallest element from each array.
We’ll maintain the priority queue based on the first element of each array and pop the smallest element at each step.
Here’s the modified implementation:
from queue import PriorityQueue
from typing import List
def merge_sorted_arrays(arrays: List[List[int]]) -> List[int]:
# Resulting merged array
result = []
# Priority queue to keep track of the smallest element from each array
priority_queue = PriorityQueue()
# Initialize the priority queue with the first element from each array
for i, array in enumerate(arrays):
if array:
priority_queue.put((array[0], i, 0))
# Continue until the priority queue is empty
while not priority_queue.empty():
# Pop the smallest element from the priority queue
val, array_idx, elem_idx = priority_queue.get()
# Add the popped element to the result
result.append(val)
# Move to the next element in the same array
if elem_idx + 1 < len(arrays[array_idx]):
priority_queue.put((arrays[array_idx][elem_idx + 1], array_idx, elem_idx + 1))
return result
# Example usage:
arrays = [
[1, 3, 5],
[2, 4, 6],
[0, 7, 8]
]
result = merge_sorted_arrays(arrays)
print(result)
assert result == [0, 1, 2, 3, 4, 5, 6, 7, 8]
Explanation:
- The result list is used to store the merged sorted array.
- The
priority_queue
is a priority queue (implemented usingPriorityQueue
from the queue module) to keep track of the smallest element from each array along with its array index and element index. - The priority queue is initialized with the first element from each array.
- The loop continues until the priority queue is empty, and in each iteration, the smallest element is popped from the priority queue, added to the result, and the next element from the same array is pushed to the priority queue if available.
Time Complexity:
The time complexity of the provided solution is O(N log k
), where N
is the total number of elements across all arrays, and k
is the number of arrays.
PriorityQueue
Operations: Inserting an element into the priority queue or extracting the minimum element takes O(log k
) time, wherek
is the number of arrays.- Number of Elements: We perform these operations for each element in the resulting merged array. Therefore, for
N
total elements, the time complexity becomes O(N log k
).
Space Complexity:
The space complexity is O(k
), where k
is the number of arrays. This accounts for the space used by the priority queue, which can have at most k
elements at any given time.
PriorityQueue
Space: The space required for the priority queue is proportional to the number of arrays (k
) since we only store one element from each array at any given time.- Result Array: The space required for the
result
array is O(N
), whereN
is the total number of elements across all arrays.
Overall, the dominating factor for space complexity is the priority queue, leading to a space complexity of O(k
).
Solution 3: O(1
) Space Complexity
We can solve the problem above with O(1
) space complexity without using a heap or priority queue.
Instead, we can iterate through all the arrays simultaneously and pick the smallest element at each step.
Here’s the implementation:
from typing import List
def merge_sorted_arrays(arrays: List[List[int]]) -> List[int]:
# To store the merged sorted array
result = []
# To keep track of the current index for each array
indices = [0] * len(arrays)
# Continue until there are elements left in any of the arrays
while any(indices[i] < len(arrays[i]) for i in range(len(arrays))):
# Initialize the minimum value to positive infinity
min_val = float('inf')
# Initialize the index of the array with the minimum value
min_idx = -1
# Iterate through each array and find the minimum value
for i in range(len(arrays)):
# Check if there are elements left in the current array
if indices[i] < len(arrays[i]) and arrays[i][indices[i]] < min_val:
min_val = arrays[i][indices[i]]
min_idx = i
# Append the minimum value to the result
result.append(min_val)
# Increment the index of the array from which the minimum value was picked
indices[min_idx] += 1
return result
# Example usage:
arrays = [
[1, 3, 5],
[2, 4, 6],
[0, 7, 8]
]
result = merge_sorted_arrays(arrays)
print(result)
assert result == [0, 1, 2, 3, 4, 5, 6, 7, 8]
Explanation:
- We use an
indices
list to keep track of the current index for each array. - In each iteration, we find the minimum value among the elements at the current indices of all arrays.
- We append the minimum value to the result and increment the index of the array from which we picked the element.
- We continue this process until all arrays are exhausted.
Time Complexity:
The time complexity is O(N * k
), where N
is the total number of elements across all arrays,
and k
is the number of arrays. In each iteration, we perform a constant number of comparisons for each array.
Space Complexity:
The space complexity is O(1
) as we do not use any additional data structures besides the input and output arrays.
Solution 4: Divide and Conquer, O(1
) Space and O(N log k
) Time Complexity
We can achieve O(1
) space complexity and O(N log k
) time complexity
by using a modified divide and conquer approach.
We’ll repeatedly merge pairs of sorted arrays until we have a single merged array.
Here’s the implementation:
from typing import List
def merge_sorted_arrays(arrays: List[List[int]]) -> List[int]:
# Check if the list of arrays is empty
if not arrays:
return []
# Define a helper function to merge two sorted arrays
def merge_two_arrays(arr1: List[int], arr2: List[int]) -> List[int]:
result = []
i = j = 0
# Merge elements from arr1 and arr2 in sorted order
while i < len(arr1) and j < len(arr2):
if arr1[i] < arr2[j]:
result.append(arr1[i])
i += 1
else:
result.append(arr2[j])
j += 1
# Append remaining elements, if any
result.extend(arr1[i:])
result.extend(arr2[j:])
return result
# Repeatedly merge pairs of arrays until only one array remains
while len(arrays) > 1:
merged_arrays = []
# Merge pairs of arrays
for i in range(0, len(arrays), 2):
if i + 1 < len(arrays):
merged = merge_two_arrays(arrays[i], arrays[i + 1])
else:
# If there is an odd number of arrays, keep the last array as is
merged = arrays[i]
merged_arrays.append(merged)
arrays = merged_arrays
# The final result is the merged array
return arrays[0]
# Example usage:
arrays = [
[1, 3, 5],
[2, 4, 6],
[0, 7, 8]
]
result = merge_sorted_arrays(arrays)
# Merged Sorted Array: [0, 1, 2, 3, 4, 5, 6, 7, 8]
assert result == [0, 1, 2, 3, 4, 5, 6, 7, 8]
print(result)
Explanation:
- The
merge_two_arrays
function merges two sorted arrays into a single sorted array. - The main function repeatedly merges pairs of arrays until only one array remains.
Time Complexity:
The time complexity is O(N log k
), where N
is the total number of elements across all arrays,
and k
is the number of arrays.
In each iteration, we perform O(N
) comparisons, and we repeat this process log k
times.
Here is the breakdown:
- Merging Pairs of Arrays:
- In each iteration of the main loop, we merge pairs of sorted arrays. The number of comparisons made during this merging step is proportional to the total number of elements across all arrays, denoted as
N
. - Therefore, the time complexity for merging pairs of arrays is O(
N
).
- In each iteration of the main loop, we merge pairs of sorted arrays. The number of comparisons made during this merging step is proportional to the total number of elements across all arrays, denoted as
- Number of Iterations:
- In each iteration, the number of remaining arrays is reduced by half. The iterations continue until only one array remains.
- The number of iterations is
log k
, wherek
is the number of arrays.
- Overall Time Complexity:
- The overall time complexity is O(
N log k
), whereN
is the total number of elements across all arrays, andk
is the number of arrays. - This is because we perform O(
N
) work in each of thelog k
iterations.
- The overall time complexity is O(
This divide and conquer approach efficiently merges sorted arrays, and the time complexity is kept within O(N log k
).
Space Complexity:
The space complexity is O(1
) as we are not using any additional data structures except for a few variables.