Longest Common String Subsequence

2 min readSep 5, 2020

Problem:

Write a function that takes two string arrays as input and returns the longest contiguous sequence of strings that appear in both.

For example, given the following two arrays:

a = [“/home”, “/register”, “/login”, “/user”, “/one”, “/two”]
b = [“/home”, “/red”, “/login”, “/user”, “/one”, “/pink”]

You should return the following:

[“/login”, “/user”, “/one”]

Approach:

The key insight comes from asking: “Is there a recursive substructure to the problem that we can exploit?” and “Can we cache intermediate results of the recursive calls?”

Consider locations i and j in arrays a and b respectively. How can we find the longest common subsequence of arrays starting at a[i] and b[j]?
if a[i] == b[j] then the length of optimal subsequence is one plus the length of the longest common subsequence of arrays starting at a[i+1] and b[j+1].
Otherwise, the optimal subsequence is empty.
We can use Dynamic Programming to cache intermediate computation of the recursive call in a hash table to be reused later.

Once you have the recursive function, the longest common subsequence can be found by repeatedly calling it on the two arrays:

Iterative Implementation with Compact Loops:

What is the complexity of the above implementation? We call the longest_recursive function for every (i, j) pair of indices of the original arrays. Hence, the complexity is at least O(MN) where M and N are the sizes of the two arrays. Although we are making further recursive calls to longest_recursive for each (i, j), note that all but one of those calls return immediately by using a constant-time lookup in the table, and hence the total complexity is O(MN).

This complexity analysis underscores the limitations of the recursive implementation (although it’s pretty good to have, and easy to code). Here are a few others:

The function signature of longest_recursive is long, and is a maintenance headache. A shorter function signature is always preferable.
The unordered_map does not guarantee a collision-free lookup, and in worst case, the map lookup might not be constant time, resulting in degradation of performance.
Runtime errors in recursive (especially stack depth exceeding/bad base cases) are hard to debug.
For this problem, a single iterative function is sufficient to compute the max_count and update the table as well, making the recursive implementation slightly longer.

We can modify the longest() method above and replace the recursive call as follows:

Testing:

This problem demonstrates the use of ElementsAre method of Gunit tests.

A few simple tests for empty lists:

Simple tests for identical arrays and for arrays with no common elements

And, finally the complex example in the problem statement:

Longest Common String Subsequence

Problem:

Approach:

Iterative Implementation with Compact Loops:

Testing:

Written by CppCodingZen