Longest Common String Subsequence
Problem:
Write a function that takes two string arrays as input and returns the longest contiguous sequence of strings that appear in both.
For example, given the following two arrays:
a = [“/home”, “/register”, “/login”, “/user”, “/one”, “/two”]
b = [“/home”, “/red”, “/login”, “/user”, “/one”, “/pink”]
You should return the following:
[“/login”, “/user”, “/one”]
Approach:
The key insight comes from asking: “Is there a recursive substructure to the problem that we can exploit?” and “Can we cache intermediate results of the recursive calls?”
- Consider locations
i
andj
in arraysa
andb
respectively. How can we find the longest common subsequence of arrays starting ata[i]
andb[j]
? - if
a[i] == b[j]
then the length of optimal subsequence is one plus the length of the longest common subsequence of arrays starting ata[i+1]
andb[j+1]
. - Otherwise, the optimal subsequence is empty.
- We can use Dynamic Programming to cache intermediate computation of the recursive call in a hash table to be reused later.
Once you have the recursive function, the longest common subsequence can be found by repeatedly calling it on the two arrays:
Iterative Implementation with Compact Loops:
What is the complexity of the above implementation? We call the longest_recursive
function for every (i, j)
pair of indices of the original arrays. Hence, the complexity is at least O(MN)
where M
and N
are the sizes of the two arrays. Although we are making further recursive calls to longest_recursive
for each (i, j)
, note that all but one of those calls return immediately by using a constant-time lookup in the table, and hence the total complexity is O(MN)
.
This complexity analysis underscores the limitations of the recursive implementation (although it’s pretty good to have, and easy to code). Here are a few others:
- The function signature of
longest_recursive
is long, and is a maintenance headache. A shorter function signature is always preferable. - The
unordered_map
does not guarantee a collision-free lookup, and in worst case, the map lookup might not be constant time, resulting in degradation of performance. - Runtime errors in recursive (especially stack depth exceeding/bad base cases) are hard to debug.
- For this problem, a single iterative function is sufficient to compute the
max_count
and update the table as well, making the recursive implementation slightly longer.
We can modify the longest()
method above and replace the recursive call as follows:
Testing:
This problem demonstrates the use of ElementsAre
method of Gunit tests.
A few simple tests for empty lists:
Simple tests for identical arrays and for arrays with no common elements
And, finally the complex example in the problem statement: