Problem Description

Suppose String A and B have the same length of m + 1, B can be attached to the end of A, if the first m characters of A are the same to the last m characters of B.

Given n strings in the same length(m + 1 long), find out a longest attachment of these strings.

Analysis

We can split the problem into two parts: 1) attachment check for a given string. 2) find out a longest attachment.

Attachment Check

*An Attachment can be made only if A and B have the same length of M + 1 and the last M elements of A are the same to the first M elements of B.

General way to do the M-element check that is easy but time consuming(O(n^2*M)).

A better way to do the M-element check is to use Trie.

Find out a longest attachment

Let’s thinking of the N given strings as N nodes. If B can be attached to the end of A, then node A has a oneway path to B.

Now, the problem has changed to finding a longest path of a directed acyclic graph.

General way to do that is easy but time consuming(O(n^2)). We can see lots of repetitive work during the O(n^2) process.

Let’s try Dynamic Programming(DP).

Define max_pre[i]: the maximum number of node[i]’s ancestors.

Then it is easy to have max_pre[child] = max(max_pre[child], max_pre[node] + 1).

It’s a top-down approach formula, which means it works only if parent nodes are done before child nodes.

So we’re going to apply this formula to the nodes in a Topological Ordering.

My Solution: Trie + Topological Ordering + Dynamic Programming

First, read the N given strings one by one.

Second, construct a Trie tree for the strings.

Third, create a list childs[N]. If node j can be attached to node i, then child[i] has node j. Use Trie to speed up the Attachment Check.

Fourth, work out the topological ordering of the graph, then apply the DP formula.

Longest Increasing problem:
The problem:
The original sequence is a[i]
i is the index of the numbers
n^2:
Construct an array :
length[j] is used to store the length of the longest increasing subsequence, and the last number of the subsequence is a[j].
j is the index of the numbers(which is the same as i)
When we want to find the value of length[j], we must scan the value of the length[0]……length[j-1], find the largest value and plus it with one and set it as the value of the length[j].
Thus,
We must use double loop to get the solution of LIS.
The first loop is from the first number of sequence to the last number of sequence, which is used to fill the value of length array.
Because if we want to find the particular length[j], we must have another loop to scan the value of the length[0]……length[j-1].
The runtime will be O(n^2).

n*log(n):
Just the same as above, the original sequence is a[i]
I is the index of the numbers.
Construct another sequence.
Because the length of the sequence we use in this algorithm is unfixed, we use a vector to represent the sequence.
The sequence we construct is b[j]
Here j has two meanings:
(1) J is the index of this sequence. b[j] is the last number of subsequence we find.(not the original sequence)
(2) j is the length of the subsequence whose last number is b[j].(we can get more information from the explanation below )

Which means:

b[0] : the length of the increasing subsequence is 1.
b[1] : the length of the increasing subsequence is 2.
b[2]: the length of the increasing subsequence is 3
.
.
.
.
b[j]: the length of the increasing subsequence is j+1.
And the length of sequence b(b[0]……b[j]) is the length of the longest increasing subsequence.(This is the result of the LIS problem).

Because:
If the length of the two subsequence we find are same, the subsequence whose last number is smaller has much chance to form another increasing subsequence.
For example:
Subsequence 1: 0 1 8
Subsequence 2: 0 1 2
Obviously, the length of subsequence 2 is the same as subsequence 1. But the subsequence 2 has much chance to form another increasing subsequence.
For instance,
We scan the original sequence one by one.
If we find the number ”4”, we could not put the number”4” at the end of the subsequence “0 1 8”.
Because “0 1 8 4” is not an increasing subsequence.
But “0 1 2 4” is a new increasing subsequence whose length is 4.

Assume that:
The last number in an increasing subsequence Sa is larger than the another increasing subsequence Sb
The continue numbers we scan which can form another new longer increasing subsequence with Sa can also form a new longer increasing subsequence with Sb.

Thus,
Here the sequence b (b[0]…….b[j]):
J represents the length of the subsequence we find.
B[j] is the smallest last number of the subsequence whose length is j.

Solution:
After we understood the exact meaning of the sequence b, we can construct the solution.
We scan the original sequence a (a[0]……a[j]) from the beginning to the end of the sequence.
We can scan the sequence b:
When b[j] is smaller than a[i], it means that the subsequence b[j] represent can form a new subsequence with a[i].
So we should find an index j, which :
b[j]<a[i] and b[j+1]>a[i].
And then we replace the b[j+1] with a[i].
If we could not satisfy the condition above, we put a[i] at the end of the sequence b
So b sequence will still keep its characteristics.

Thus, we can scan the number in the original sequence one by one from the beginning to the end and try to construct the whole sequence b. The length of the sequence b will be the answer of the LIS.

The sequence b must be in increasing order.
Prove:
The method we construct the sequence b
(1) b[j]<a[i], b[j+1]>a[i], we replace the b[j+1] with a[i].
(2) if we could not satisfy the condition (1), we can put a[i] at the end of sequence b

Assume sequence b is in order at the beginning,
(1)
If we satisfy the b[j+1]>a[i],
because sequence b is in order, b[j+2]>b[j+1]>a[i],even if b[j+1] is replaced with a[i], the sequence b will still be in increasing order.
(2) Of course put the largest one at the end of the sequence b will not change the increasing order of sequence b.

Now we prove that sequence b will be in increasing order.
Thus,
we do not need to scan the whole sequence, we can use binary search.
If we use the binary search, we could degrade the second loop to O(log n).

So the total algorithm will also use double loop. The first loop is n, the second loop is logn.
And the total algorithm will be O(nlogn).

Attention:
The sequence b is not the subsequence we find. If we use this algorithm, we can only find the longest length, we could not know the content of the longest increasing subsequence.

Poly-Thinking

From Poly Study Group

Monthly Archives: November 2011

Longest Path Problem: to find a longest path of a directed acyclic graph