Algorithms: Longest Increasing Subsequence

*LIS is a quite typical concept/problem in dynamic programming and I can’t help to forget the nlogn solution again and again. Hopefully this article would stamp it on my brain.

Description: find the longest increasing subsequence. i.e. the LIS of { 1, 8, 5, 25, 10, 24, 19 } is {1,5,10,19}.

Solution: There are some attitudes to keep in mind for a optimal solution.

  • The word “sequence” implys a incremental solution.
  • Among all possible LISs of length L, we only consider/build upon the most extendible one(in this case, the one ending with the smallest value. A Greedy attitude)
  • Also, when a new number incrementally added to the end of candidate sequence, we can update the solution set in O(logn) using Binary Search. (quite a invisible property)

Leetcode: Distinct Subsequences

Given two strings A and B, find out how many distinct subsequences in A that are exactly equal to B.

Let get close to what is a subsequence.

If S is a subsequence of B, then each element of S can be assigned with an index at B. A valid assignment is that indices of B’s elements aren’t overlapping and in an ascending order.

Then we can come up with a naive solution.

  1. for_each(S.begin(), S.end(), assign(char ch, T, range) {…}

Notice that the range parameter represents the constraint that in a valid assignment, indices are in ascending order.

Now this isn’t what all we are looking for. The question requires the number of valid distinct subsequences we can find in T. We will traverse all possible states in order to produce the correct number.

strawmen solution 2: use backtracking to traverse all possible states.

Let’s get close to the details in this solution. Implementing the backtracking algorithm would be the best way to achieve that purpose.

class Solution {
public:
    int numDistinct(string S, string T) {
        return BT(S.c_str(), T.c_str());
    }

    int BT(const char* s, const char* t) {
        if (*t == '\0') return 1;
        if (*s == '\0') return 0;
        if (*s == *t) 
            return BT(s+1, t+1) + BT(s+1, t);
        return BT(s+1, t);
    }
};

This can’t be wrong because BT is THE WAY to traverse all possible states! But hold on, what is the running time of this BT? The question is equivalent to how many sequences we traverse. And the answer is O(C(n, m)), where n is the length of the original string and m is the length of the target subsequence. That is almost n sqrt, which doesn’t seem acceptable for strings processing(strings are large in real life). Does any state have some nice properties that cause the same results, or are there any duplicate states??

Strawman solution 3: 

each call to BT() represents a unique intermediate state. It’s not hard to tell that in the overall BT process, there are duplicate intermediate states. Recomputation on duplicate states are supposed to be cut out by just simply utilizing the previous result.

class Solution {
    vector<vector<int>> mem;

public:
    int numDistinct(string S, string T) {
        mem = vector<vector<int>>(S.length(), vector<int>(T.length(), -1));
        return BT(S, T, 0, 0);
    }

    int BT(string& S, string&T, int si, int ti) {
        // a valid assignment has found
        if (ti == T.length()) {
            return 1;
        }
        // reach the end, no valid assignment found.
        if (si == S.length()) {
            return 0;
        }
        // duplicate states?
        if (mem[si][ti] > -1)
            return mem[si][ti];

        // try to match S[si] with T[ti]
        int sum = 0;
        if (S[si] == T[ti]) {
            sum += BT(S, T, si+1, ti+1);
        }
        // try to match T[ti] with some later element in S.
        sum += BT(S, T, si+1, ti);
        mem[si][ti] = sum;
        return sum;
    }
};

Concurrency: Mutex vs Spinlock

Mutex

Differences between mutexs and spinlocks.

  • If a thread failed to lock a mutex, it gets on waitlist and goes sleep.
  • If a thread failed in a spinlock, it busywait for the lock to release.

It essentially is trade-off between context-switch and busy-wait. Spinlocks are good for short-period wait while mutexs are good for long-period wait.

reference http://stackoverflow.com/questions/5869825/when-should-one-use-a-spinlock-instead-of-mutex

Unix: File Access Permission

How does File Access Permission work on Unix?

Screen Shot 2013-06-09 at 11.06.00 PM

drwxr-xr-x, totally 10 bits, represents the access permission of the current directory. Here are the rules and explanations:

  • First bit indicates the file type. ‘d’ represents ‘directory’ and ‘-‘ represents ‘regular file'(not sure actually. It seems to have other file types).
  • Bits from 2nd to 4th indicate different access permissions for the owner(weijingliu in this case).
  • Similarly, bits from 5th to 7th indicate access permissions for the group.
  • The last 3 bits indicate access permissions for the other users.

Notice that there are additional rules to apply for directories. Suppose we issue a cd command, “cd ~/Documents/download”. Unix File System(UFS) actually performs a search from root to the destination. Meanwhile, we must have execute permission in each directory mentioned in the name.

Reference: Ch 4.5. Advanced Programming in the UNIX Environment

Leetcode: Word Ladder II

Problem description:

http://leetcode.com/onlinejudge#question_126

Given two words (start and end), and a dictionary, find all shortest transformation sequence(s) from start to end, such that:

Only one letter can be changed at a time
Each intermediate word must exist in the dictionary
For example,

Given:
start = “hit”
end = “cog”
dict = [“hot”,”dot”,”dog”,”lot”,”log”]
Return
[
[“hit”,”hot”,”dot”,”dog”,”cog”],
[“hit”,”hot”,”lot”,”log”,”cog”]
]
Note:
All words have the same length.
All words contain only lowercase alphabetic characters.

 

Design and Pseudo code:

The same as Word Ladder I, use BFS to search all shortest possible path from start to end.

The tricky point in this problem is that the naive solution will easily exceed the running time limit, optimization is needed for this problem, i.e. eliminate the time for string comparing and copying.

 

1) map the dictionary to index, remember the mapping between words and indexes.

2) for start index, use BFS to find all possible path to end index. At first push start index into queue.

2.1) for every words in queue, find the adjacent list of them by searching dictionary for 1 step transfer words

2.2) for every word in adjacent list, push into queue and remember their depth and their parent word index

2.3) if found end index, stop searching

3) use backtracking form end index to generate the full path reversely, and translate from index back to strings

Solution & Analysis:

uses 668 milli secs for large judge
for word length k, and searching in dictionary takes O(1), and the depth of trasation is d
time complexity is O((k*26)^d), space complexity is O((k*26)^d)

class Solution {
    // function to generate path using backtracking method
    void gen(int v1, int v2, vector<string> &vdict, vector<vector<int> >& prev, 
             vector<int>& path, vector<vector<string> >&ans){
             
        path.push_back(v2); // push the nodes into path for back tracking
        
        if(v2 == v1 and path.size() > 1){
            ans.push_back(vector<string>());
            // translate the path of index to string in reverse order
            for(auto rit = path.rbegin(); rit != path.rend(); rit++)
                ans.back().push_back(vdict[*rit]);
        }else{
            // gengrate all possible path in recursive way
            for(int i = 0; i < prev[v2].size(); i++)
                gen(v1, prev[v2][i], vdict, prev, path, ans);
        }
        
        path.pop_back(); // pop the nodes from path for back tracking
    }
public:
    vector<vector<string> > findLadders(string start, string end, unordered_set<string> &dict) {
        dict.insert(start);
        dict.insert(end);
        
        vector<string> vdict(dict.begin(), dict.end()); // vector dictionary: id -> word mapping in dict
        unordered_map<string, int> ids;  // index dictionary: word -> id mapping in vdict
        vector<vector<int> > prev(dict.size()); // store the previous words in BFS
        vector<int> distance(dict.size(), -1); // store the distance from start
        
        // build string - index mapping, transfer problem to graph search
        // use interger instead of string to eliminate cost of string matching
        for(int i = 0; i < vdict.size(); i++)
            ids[vdict[i]] = i;
        
        // find the index of start and end words
        int vstr=0, vend=0;
        while(vdict[vstr] != start) vstr++;
        while(vdict[vend] != end) vend++;
        
        // use queue for BFS to search path from start to end
        queue<int> que;
        que.push(vstr);
        distance[vstr]=0;
        
        while(not que.empty()){
            int v1 = que.front(); que.pop();
            if(v1 == vend) break;
            int d = distance[v1] + 1;
            
            // get adjancent list of the words, branching factor of BFS is 26 characters * string size
            vector<int> adj;
            // erase the appeared words from dictionary can save search time
            ids.erase(vdict[v1]);
            
            // find all the words that can be transfered in 1 step
            for(int j = 0; j < vdict[v1].size(); j++){
                char w = vdict[v1][j];
                for(char c = 'a'; c <= 'z'; c++){
                    vdict[v1][j] = c;
                    if(ids.count(vdict[v1]))
                        adj.push_back(ids[vdict[v1]]);
                    vdict[v1][j] = w;
                }
            }
            
            // use BFS to calculate path to end, add new words (distance=-1) into queue
            // if multiple words can reach the same words, save all the words
            for(int i = 0; i < adj.size(); i++){
                int v = adj[i];
                if(distance[v] == -1){
                    prev[v].push_back(v1);
                    distance[v] = d;
                    que.push(v);
                }else if(distance[v] == d){
                    prev[v].push_back(v1);
                }
            }
        }

        // generate the full sting path using the index
        vector<vector<string> > ans;
        vector<int> path;
        gen(vstr, vend, vdict, prev, path, ans);
        return ans;
    }
};

Leetcode: Word Ladder

Problem description:

http://leetcode.com/onlinejudge#question_127

Given two words (start and end), and a dictionary, find the length of shortest transformation sequence from start to end, such that:

Only one letter can be changed at a time
Each intermediate word must exist in the dictionary
For example,

Given:
start = “hit”
end = “cog”
dict = [“hot”,”dot”,”dog”,”lot”,”log”]
As one shortest transformation is “hit” -> “hot” -> “dot” -> “dog” -> “cog”,
return its length 5.

Note:
Return 0 if there is no such transformation sequence.
All words have the same length.
All words contain only lowercase alphabetic characters.

 

Design and Pseudo code:

Use BFS to search all possible words from start, and the tricky point here is that you should search thru all words in dictionary for candidate words, this may take o(n) time for searching in each recursion.

Instead, replace one character in a time, and search in dictionary by hashing, assume that searching in dictionary takes o(1) and the length of string is k, this may take o(26*k) time for searching in each recursion, when dictionary size is much bigger than string length (n >> k), this will save lots of time.

1) push start word into queue for BFS

2) for each word in queue:

2.1) search all new words in dictionary that can be transfered in 1 step

2.2) if found end words, stop and return distance, else push words into queue

Solution & Analysis:

uses 608 milli secs for large judge
for word length k, and searching in dictionary takes O(1), and the depth of trasation is d
time complexity is O((k*26)^d), space complexity is O((k*26)^d)

class Solution {
public:
    int ladderLength(string start, string end, unordered_set<string> &dict) {
        if(start == end) return 0;
        if(start.size() != end.size()) return 0;
        
        // use queue for BFS, store word string and their distance from start
        queue<pair<string, int> > que;
        que.push(make_pair(start, 1));
        dict.erase(start);
        
        while(!que.empty()) {
            string str = que.front().first;
            int len = que.front().second;
            que.pop();
            
            // branching factor of BFS is 26 characters * string size
            for(int i=0; i<str.size(); i++)
            for(char c='a'; c<='z'; c++) {
                char chi = str[i];
                str[i] = c;
                
                if(str == end) return len+1;
                
                // use count instead of find to make searching faster
                if(dict.count(str) != 0) {
                    que.push(make_pair(str, len+1));
                    // erase the appeared words from dictionary can save search time
                    dict.erase(str);
                    if(dict.empty()) break;
                }
                
                str[i] = chi;
            }
        }
        return 0;
    }
};

Leetcode: Palindrome Partitioning II

Problem description:

http://leetcode.com/onlinejudge#question_132

Given a string s, partition s such that every substring of the partition is a palindrome.

Return the minimum cuts needed for a palindrome partitioning of s.

For example, given s = “aab”,

Return 1 since the palindrome partitioning [“aa”,”b”] could be produced using 1 cut.

Design and Pseudo code:

This problem is similar to Palindrome Partitioning I, however, the large test set requires us to limit the time complexity to a low level.

The general idea for this problem is to use dyanmic programming.

Naive approach:

1) if s is Palindrome, return 0
2) else use a (n X n) matrix to store mincut(s[i:i+len]), in here s[i:i+len] means the substring of s that starts a s[i] and ends at s[i+len]
3) for i=0 to s.size-1: initialize mincut[i,1] = 0
4) for len = 2 to s.size:

4.1) for i=0 to s.size-len:

4.2) if s.sub(i, len) is Palindrome: mincut[i,len] = 0

4.3) else mincut[i,len] = min(mincut[i,j-i] + mincut[j,len-j]) + 1

5) return dp[0, s.size]

Improved approach:

1) use an array to store minCut of s[:i] and palidrome s[:i]

2) for i = 0 to s.size()

2.1) for j = 0 to i

2.2) if s[j] == s[i] and s[j+1 : i-1] is palidrome, then s[j:i] is palidrome:

2.3) update mincut(s[:i]) = minimum of mincut(s[:j]) + 1

2.4) else or else mincut(s[:i]) = mincut(s[:i-1]) + 1

3) return mincut(s)

Solution & Analysis:

Version 1, Time Limit Exceeded, time complexity is o(n^3), space complexity is o(n^2).

class Solution {
public:
    bool isPalindrome(string s) {
        if(s.size() == 0) return true;
        if(s.size() == 1) return true;
        
        for(int i=0; i<s.size()/2; ++i)
            if(s[i] != s[s.size()-1-i])
                return false;
        return true;
    }

    int minCut(string s) {
        // use dynamic programming
        // if s is Palindrome, push to answer
        // use a (nXn) matrix to store mincut[i,len]
            // for i=0 to s.size-1: mincut[i,1] = 0
            // for len = 2 to s.size:
                // for i=0 to s.size-len: 
                    // if s.sub(i, len) is Palindrome: mincut[i,len] = 0
                    // else mincut[i,len] = min(mincut[i,j-i] + mincut[j,len-j]) + 1
            // return dp[0, s.size]
                
        if(isPalindrome(s))
            return 0;
            
        vector<vector<int> > dp; // use to store minCut[i, len]
        
        // initial minCut (minCut[i, len] means the result of minCut(s.sub(i, len)))
        for(int i=0; i<s.size(); ++i)
            dp.push_back(vector<int>(s.size()-i+1, 0));
        
        for(int len=2; len<=s.size(); ++len) {
            for(int i=0; i<=s.size()-len; ++i) {
                if((s[i] == s[i+len-1]) && (dp[i+1][len-2]==0)) {
                    dp[i][len] = 0;
                } else {
                    dp[i][len] = len-1;
                    for(int l=1; l<len; ++l) {
                        int cut = dp[i][l] + dp[i+l][len-l] + 1;
                        dp[i][len] = (dp[i][len] < cut) ? dp[i][len] : cut;
                        if(dp[i][len] < 2) break;
                    }
                }
            } // end of: for(int i=0; i<=s.size-len; ++i)
        } // end of: for(int len=2; len<s.size; ++len)
        
        return dp[0][s.size()];
    }
};

Version 2, uses 120 milli secs for large judge, time complexity is o(n^2), space complexity is o(n^2).

class Solution {
public:
    int minCut(string s) {
        int len = s.size();
        int dp[len]; // store minCut of s[i:]
        bool palidrome[len][len]; // store if s[i:j] is palidrome
        
        // init dp of mincut with maximum cut for s[i:] is (len-i-1)
        for(int i=0; i<=len; ++i)
            dp[i] = len-i-1;
        for(int i=0; i<len; ++i)
        for(int j=0; j<len; ++j)
            palidrome[i][j] = false;
        
        // start from end to head of string
        for(int i=len-2; i>=0; --i)
        for(int j=i; j<len; ++j)
            // only consider when s[i:j] is palidrome (s[i] == s[j] and s[i+1:j-1] is palidrome)
            // or else mincut(s[i:]) = mincut(s[i+1:]) + 1
            if((s[i] == s[j]) && (j-i<2 || palidrome[i+1][j-1])) {
                palidrome[i][j] = true;
                // mincut s[i:] is minimum of mincut(s[j:]) + 1 (that is, first cut at j)
                dp[i] = min(dp[i], dp[j+1]+1);
            }
        
        return dp[0];
    }
};

Version 3, uses 28 milli secs for large judge, time complexity is o(n^2), space complexity is o(n).

class Solution {
public:
    int minCut(string s) {
        if (s.size() < 2)
            return 0;
        int len = s.size();
        // use two arrays for dyanmic programming
        int minSegs[len+1]; // store the min palidrome segments of s[:i]
        int palLen[len+2]; // store all lengths of palidrome that ends at s[i]
        
        // initialize the minSegs and possible lengths of palidrome for s[:1]
        minSegs[0] = 0; minSegs[1] = 1;
        palLen[0] = 0; palLen[1] = 1; palLen[2] = -1;
        
        for (int i=1; i<s.size(); i++) {
            // init mincut(s[:i+1]) = mincut(s[:i]) + 1
            minSegs[i+1] = minSegs[i] + 1;
            int n = 0; // numbers of palidrome that ends at s[i]
            // for all palidrome that ends at s[i-1], check if s[left:i] is palidrome
            for (int j=0; palLen[j] != -1; j++) {
                int left = i-1-palLen[j];
                // if s[left:i] is palidrome, add length of s[left:i] into array and update minSegs[i+1]
                if (left>=0 && s[left]==s[i]) {
                    palLen[n++] = palLen[j] + 2;
                    // first cut at left and s[left:i] is palidrome -> minSegs[i+1] = minSegs[left] + 1
                    minSegs[i+1] = min(minSegs[i+1], minSegs[left] + 1);
                    if(minSegs[i+1] == 1) break;
                }
            }
            // add lengths 0 and 1 to array of palidrome length for s[:i]
            palLen[n++] = 0; palLen[n++] = 1; palLen[n] = -1;
        }
        
        // minCut(s) = minSeg(s) - 1
        return minSegs[s.size()] - 1;
    }
};

Leetcode: Palindrome Partitioning

Problem description:

http://leetcode.com/onlinejudge#question_131

Given a string s, partition s such that every substring of the partition is a palindrome.

Return all possible palindrome partitioning of s.

For example, given s = “aab”,
Return

[
[“aa”,”b”],
[“a”,”a”,”b”]
]

Design and Pseudo code:

To get all possible palindrome partitioning of s, we need to try all possible partitions recursively.

Partition(s):

1) find all possible palindrome in s that start with the first letter

2.1) for each palindrome p, get the substring s1 of s that have the characters after p, so s = p + s1

2.2) find all possible palindrome partition of s1 by calling partition(s1), add p to the begin of return and save to answers

3) return answers

Solution:

class Solution {
public:
    // function to find if s is palindrome
    bool isPalindrome(string s) {
        if(s.size() == 0) return true;
        if(s.size() == 1) return true;

        for(int i=0; i<s.size()/2; ++i)
            if(s[i] != s[s.size()-1-i])
                return false;
        return true;
    }

    vector<vector > partition(string s) {
        vector<vector > answer;

        if(s.size() == 0)
            return answer;

        // if s is Palindrome, push to answer
        if(isPalindrome(s))
            answer.push_back(vector(1,s));

        if(s.size() == 1)
            return answer;

        for(int i=1; i<s.size(); ++i) {
            string s1 = s.substr(0, i);
            string s2 = s.substr(i, s.size()-i);

            if(isPalindrome(s1)) {
                // get sub_answers of partition(s[i:])
                vector<vector > sub_answer = partition(s2);

                // add s1 to the begin of sub_answers
                for(int j=0; j<sub_answer.size(); ++j) {
                    sub_answer[j].insert(sub_answer[j].begin(), s1);
                    answer.push_back(sub_answer[j]);
                }
            }
        }

        return answer;
    }
};

Analysis:

Passed leetcode judge large, 464 milli secs.

The time complexity of function isPalindrome takes o(n), and the partition will call isPalindrome n times, which takes o(n^2).

And the creation of s1, s2 takes o(n), and was done n times, totally o(n^2), and the all possible combinations of result is 2^n.

And the partition will recursively call itself n times, so the total time complexity is o(n*2^n).

The space complexity of partitions is the same as all possible combination of partition of string, which is o(2^n).

Leetcoding in C++11: Vector Initialization

How to initialize an array[m][n] with all elements in -1

C++98:

vector<vector<int>> array(m, vector<int>(n, -1)); // vector (size_type n, const value_type& val)

C++11:

vector<vector<int>> array(m, vector<int>(n, -1)); // vector (size_type n, const value_type val)

How to initialize an array[m] with another array {0, 1, table[1], -1*5}

C++98

int il[5] = {0, 1, table[1], -1*5}
vector<int> array(il, il+4); // vector (InputIterator begin, InputIterator end)

C++11

vector<int> array{0, 1, table[1], -1*5}; // vector (initializer_list)

References:

vector

http://www.cplusplus.com/reference/vector/vector/vector/

Leetcoding in C++11: Min/Max

How to get the minimum value of a set of number?

C++98:

int nums[] = { 1, a, tabl[0], 0 };
cout << *max_element(num, sizeof(nums)/sizeof(int));

C++11:

//1.
int nums[] = { 1, a, table[0], 0 };
cout << *max_element(begin(nums), end(num)); // T* end(T (&a)[N])
//2.
cout << max({1, a, table[0], 0}); // T max(initlizer_list<T> il)

References:

std::begin/end

http://www.cplusplus.com/reference/iterator/end/?kw=end

std::max

http://www.cplusplus.com/reference/algorithm/max/?kw=max