Anagram Pairs [Solution]
Here’s a solution to the “Anagram Pairs” problem:
from typing import List
from collections import defaultdict
def count_anagram_pairs(strings: List[str]) -> int:
"""
Find the number of pairs of strings that are anagrams.
Parameters:
- strings: A list of strings.
Returns:
- int: The number of anagram pairs.
"""
# Dictionary to store the frequency of sorted characters in each string
char_freq_count = defaultdict(int)
# Counter for tracking the number of anagram pairs
anagram_pairs_count = 0
for string in strings:
# Sort the characters in the string and convert it to a tuple
sorted_tuple = tuple(sorted(string))
# Update the dictionary with the sorted character tuple
char_freq_count[sorted_tuple] += 1
# Count the number of anagram pairs based on the frequency of sorted character tuples
for count in char_freq_count.values():
anagram_pairs_count += count * (count - 1) // 2
return anagram_pairs_count
# Example usage:
result = count_anagram_pairs(["abc", "bca"])
print(result)
assert result == 1
result = count_anagram_pairs(["abc", "Bca"])
print(result)
assert result == 0
result = count_anagram_pairs(["abc"])
print(result)
assert result == 0
result = count_anagram_pairs(["abc", "cba", "def", "fed", "12", "21"])
print(result)
assert result == 3
result = count_anagram_pairs(["listen", "silent", "enlist", "hello", "world"])
print(result)
assert result == 3
result = count_anagram_pairs(["hello", "world", "dog", "god", "cat", "tac"])
print(result)
assert result == 2
In this solution, the characters in each string are sorted before creating a tuple, ensuring that anagrams will have the same sorted character tuples regardless of their original order.
Time Complexity:
The time complexity of the revised solution is O(N * M * log(M)
), where N
is the number of strings, and M
is the average length of the strings.
Here’s the breakdown:
- Iterating through each string:
- We iterate through each string in the input list once, which gives us a
factor of O(
N
).
- We iterate through each string in the input list once, which gives us a
factor of O(
- Sorting characters in each string:
- For each
string
, we sort its characters. The time complexity of sorting is O(M * log(M)
), whereM
is the length of the string. Since we do this for each of theN
strings, the overall complexity for sorting becomes O(N * M * log(M)
).
- For each
- Counting anagram pairs:
- After sorting, we count the occurrences of each sorted character tuple in
a
defaultdict
. The insertion and lookup operations in adefaultdict
are generally considered to be O(1
). However, since we iterate through eachstring
, and hash a tuple of lengthM
, the overall complexity for updating thedefaultdict
becomes O(N * M
).
- After sorting, we count the occurrences of each sorted character tuple in
a
- Final Time Complexity:
- Combining the complexities mentioned above, the overall time complexity is
O(
N * M * log(M)
).
- Combining the complexities mentioned above, the overall time complexity is
O(
In summary, the dominant factor in the time complexity is the sorting operation,
and the overall complexity is determined by the product of the number of strings
(N
), the average length of the strings (M
), and the logarithmic factor due
to sorting.
Space Complexity:
The space complexity of the solution is O(N
), where N
is the number of
strings. Here’s the breakdown:
char_freq_count
Dictionary:- We use a defaultdict named
char_freq_count
to store the frequency of sorted character tuples. The keys in this dictionary correspond to the sorted character tuples, and the values represent the count of occurrences. - In the worst case, all strings are unique, and there are
N
distinct sorted tuples. - Therefore, the space complexity for storing the frequencies is O(
N
).
- We use a defaultdict named
anagram_pairs_count
Variable:- We use a single variable
anagram_pairs_count
to keep track of the total number of anagram pairs. This variable is independent of the input size and requires constant space, O(1
).
- We use a single variable
- Other Variables:
- The other variables used in the function, such as the
string
andsorted_tuple
variables within the loops, are of constant size and do not depend on the input size.
- The other variables used in the function, such as the
- Final Space Complexity:
- Combining the complexities mentioned above, the overall space complexity is
O(
N
).
- Combining the complexities mentioned above, the overall space complexity is
O(
In summary, the dominant factor in the space complexity is the storage of
frequencies in the char_freq_count
dictionary.
The space required is proportional to the the number of strings (N
).