F.Y.I

My materials for code interview:

You can have a look on the latest interview questions from careercup.com:
http://www.careercup.com/

And practice coding questions on Leetcode’s online judgement:
http://leetcode.com/

Here are my answers and summary for leetcode’s question on github:
https://github.com/benyl/leetcode

Here are some books that worth reading:

* CareerCup Top 150 Questions
* Cracking the Coding Interview
* C++Primer
* Head First Design Patterns

================================================================
Steps to solve algorithm problem on board:
================================================================

0) Collect detail requirements about the question: data structure, limitation, input types, error handling etc

1) Describe your solution by a simple case step by step.

2) Write your code base on the simple case.

3) “Mind compile” your code, check with corner cases.

4) Give the time & space complexity of your solution, think of ways to optimize them.

================================================================
And some useful knowledge (Good to know but not required):
================================================================

Sample coding questions from Google:
http://basicalgos.blogspot.com/?view=flipcard

如何设计一个LRU Cache
http://blog.csdn.net/hexinuaa/article/details/6630384

从B树、B+树、B*树谈到R 树
http://blog.csdn.net/v_july_v/article/details/6530142

Morris method (no recursive, no stack, O(1) space)
http://www.cnblogs.com/AnnieKim/archive/2013/06/15/MorrisTraversal.html

EXACT STRING MATCHING ALGORITHMS
http://www-igm.univ-mlv.fr/~lecroq/string/index.html

Dynamic Programming: 背包問題九講

Click to access DP.pdf

OOP questions (in java):
http://www.tutorials4u.net/java-faqs/core-java/index.html

教你如何迅速秒杀掉:99%的海量数据处理面试题
http://blog.csdn.net/v_july_v/article/details/7382693

从Trie树(字典树)谈到后缀树
http://blog.csdn.net/v_july_v/article/details/6897097

FB面试准备,这个人写的很详细,我觉得对其他面试也适用
http://www.mitbbs.com/article_t/JobHunting/32741713.html

大型WEB系统架构基本设计思路
http://www.it610.com/article/2619667.htm

增加你在美国找工作获得内推机会的10条建议
http://www.1point3acres.com/how-to-improve-your-chance-of-getting-referrals/

Get that job at Facebook
https://www.facebook.com/notes/facebook-engineering/get-that-job-at-facebook/10150964382448920

Facebook: How to crush your coding interview

系统设计总结:
http://www.mitbbs.com/article_t/JobHunting/32777529.html
http://www.wtoutiao.com/p/ef9tYN.html

================================================================
And something beyond algorithm:
================================================================

面试心仪的公司以前,我建议多面多几家公司,积累面试经验,你最起码要有一下体会:
1)面试提问千奇百怪你都遇到过,但你遇到没见过的问题不慌张。
2)现场面试过几家公司,特别是大公司。
3)足够强的体力和精神力挺过一天的面试。
4)自我介绍和项目经历倒背如流。
5)已经总结出一套跟面试官交流的经验,有几个常问的给面试官的问题。
6)算法基础扎实,对算法题有自己总结的一套经验和思路,面试前不盲目看网上的新题。熟练使用各种数据结构和优化思路,现场面对没遇过的算法题能够通过已有的知识一步一步推导。
7)对于有工作经验的工程师,最好能有自己的一套应对系统设计问题的思路,大致了解网站/分布式系统/移动开发的架构,能把设计题往自己熟悉的领域扩展,拉到自己的主场来分析。
8)关于系统设计,前端开发要考虑API设计,刷新机制,后端开发则需要考虑数据存储,状态更新,通知系统,分布式系统。最好你能有自己的一套理论归纳。简单来说就是减少数据量传输,状态更新的提示,用内存换时间性能、分布式系统的分配问题、负载均衡、数据同步。九章算法的课挺有用,mit bbs也有不错的系统设计总结,有个blog大概叫做a dummy guide to scalebilty写得就非常好,HighScalability这个网站也不错。还有一些网上的资源可以找找看。

这些不是说你特意准备以上几点,而是面试得多了,上面的几点你自然而然的就有了,你就知道自己准备充足了。不然通过的几率会比较低……

================================================================
关于OJ:
================================================================
用OJ训练算法基础是很有效的方法,现在有不少值得使用的OJ系统。我的GitHub (https://github.com/benyl/leetcode) 有当时全部leetcode解答和分析(130题),还有一个excel表格评价各题难度和归类,建议起码把3星及以下题目全部做完,和每一个类别做一个总结。Leetcode的题目我建议第一遍把题目做出来,第二遍精简代码,参考他人的答案,用尽可能简单和少的代码解决问题,这样对你的代码风格有很大帮助,注意,简单的代码不一定是最短的。

================================================================
关于读书时找工作的时间表:
================================================================
通常来说暑假实习需要提早一个学期投简历,前一年的9~12月份就应该一边读书考试,一边参加宣讲会和career fair,找认识的学长学姐内推,以及准备面试。
等到12月底学期结束,11~2月期间大公司面试完毕,没有就坚持到3~4月看能不能捞到一个暑假实习,不然就继续找。如果1、2月才开始找暑假实习,你基本上已经迟了(大公司intern head count很可能已经满了)。
等到春季学期结束,开始暑假实习,同时开始准备毕业和投简历找全职工作。
毕业前找全职工作的时间安排跟找实习差不多,同时有没有好的暑假实习经验也有很大影响。11月~1月面试完大公司。同上,1、2月才开始找的人基本上慢了一截。
最后学期继续一边上课一边投简历找全职工作,同时申请学生临时工作身份OPT。
如果工作不满意,可以继续一边工作一边准备面试更好的公司。
To be continue……

Good luck!

– Poly-Thinking

Algorithms: Longest Increasing Subsequence

*LIS is a quite typical concept/problem in dynamic programming and I can’t help to forget the nlogn solution again and again. Hopefully this article would stamp it on my brain.

Description: find the longest increasing subsequence. i.e. the LIS of { 1, 8, 5, 25, 10, 24, 19 } is {1,5,10,19}.

Solution: There are some attitudes to keep in mind for a optimal solution.

  • The word “sequence” implys a incremental solution.
  • Among all possible LISs of length L, we only consider/build upon the most extendible one(in this case, the one ending with the smallest value. A Greedy attitude)
  • Also, when a new number incrementally added to the end of candidate sequence, we can update the solution set in O(logn) using Binary Search. (quite a invisible property)

Chapter 1 – Arrays & Strings – Solution 1.3

Question:

1.3 Write a method to decide if two strings are permutation. i.e. “hello” is a permuataion of “oellh”.

If one string is a permutation of the other, they should has the same length and every character appear the same times in these two strings.

So the solution is to check the length and appear time of each character of these two strings.

We can use the same idea in question in 1.1 to solve this question. For example, use an array to store the appear time of all the characters in ASCII table.

Here is my solution, the time complexity was O(n) and space complexity is O(1).

bool
permutation(string a, string b)
{
  // if the length is different, can not be permutation
  if(a.size() != b.size()) return false;

  // use idea of Bucket-sort to count every ASCII character
  int bucket[256] = {};

  // count the apprear time of every character in a
  for(size_t i=0; i<a.size(); i++)
    bucket[a[i]]++;

  // the appear time of character in b shold be the same with a
  for(size_t i=0; i<b.size(); i++)
    if(bucket[b[i]]-- == 0) return false;

  return true;
}

Chapter 1 – Arrays & Strings – Solution 1.2

1.2 Write code to reverse a C-Style String (char* str). (C-String means that “abcd” is represented as five characters, including the null terminator.)

We can use two pointer, one is pointed to the first character, the other is pointed to the last character before the null pointer. Then we can swap the two character, and move the two pointer one step to each other and repeat.

Here is my solution, the time complexity  of this algorithm is O(n), and the space complexity is O(1).

void
reverse(char* str)
{
  size_t i = 0, j = 0;

  while(str[i]!=0) ++i; // go to the end of string "\0"
  --i; // move to the charcter before "\0"

  while(i>j) // swap the head and end characters one by one
  { // xor swap
    str[i] ^= str[j]; // a = a xor b
    str[j] ^= str[i]; // b = b xor (a xor b) = a
    str[i--] ^= str[j++]; // a = (a xor b) xor a = b
  }

  return;
}

Chapter 1 – Arrays & Strings – Solution 1.1

Cracking the code interviews – Solution

Chapter 1 Arrays & Strings

1.1 Implement an algorithm to determine if a string has all unique characters. What if you can not use additional data structure?

If the characters comes from the ASCII table, there can only be 256 types of them. We can use a boolean array of size 256 to mark each type of characters. This is the same idea of Bucket-sort algorithm.
About Bucket-Sort: http://en.wikipedia.org/wiki/Bucket_sort

When a new character is already marked as true (already appeared), then we will know that this string contains duplicate characters. The time complexity of isUnique() is O(n) and space complexity is O(1).

Here is my solution in c++:

bool
isUnique(string str)
{
  // if length bigger than 256, it must have duplicate character
  if(str.size() >= 256) return false;

  // use idea of Bucket-sort to find the duplicate character
  bool bucket[256] = {}; // remember every ASCII character

  for(size_t i=0; i<str.size(); ++i)
  {
    // mark the appeared character to true
    // if already set true, report the duplicated character
    if(bucket[str[i]])
      return false;

    bucket[str[i]] = true;
  }

  return true;
}

Another solution is to use a 256-bit-map to mark the characters in ASCII table.
The time / space complexity of isUniqueV2() is also O(n) / O(1).

bool
isUniqueV2(string str)
{
  // if length bigger than 256, it must have duplicate character
  if(str.size() >= 256) return false;

  // use idea of Bucket-sort to find the duplicate character
  uint64_t bucket[4] = {};

  for(size_t i=0; i<str.size(); ++i)
  {
    int index = str[i] / 64; // which bucket should be check
    int bit = str[i] % 64; // which bit in the bucket should be check

    // mark the appeared character to true
    // if already set true, report the duplicated character
    if( (bucket[index] & (1 << bit)) != 0)
      return false;

    bucket[index] |= (1 << bit);
  }

  return true;
}

A binary-search tree that provides “smaller-than-X counting”.

Question:

Let’s say we have several numbers. Is it possible to figure out the following tasks in O(nlogn) running time:

a) how many of them are smaller than a new number?

b) insert the new number into the group?

Idea:

1) sorted array + binary search can do that. But the inserting job would take O(n) in that way.

2) binary-search tree can do b). But it can’t do a).

Analysis:

(1) doesn’t seem to be a possible solution though. Let’s focus on 2).

Let’s think about how the insertion to be done in a binary tree: in an insertion, the inserting node reach its position by going from the root all the way down to the leaf. On the way, it compares to each node and picks the left or right path according to the result of the comparison. The number of comparisons along with the insertion is O(logn).

Maybe we can put something on the nodes to help us out in task 1).

Assume we are inserting a number 3 into the B-S tree. Let’s simulate the process of insertion.

step 1: 3 > root ? take the right path : the left path.

It seems like if 3 > root, then nodes[left_subtree] < root < 3. We can prove it by: a) all the nodes of the left subtree are put there by insertion; b) they all have been compared to root and are smaller than root.

Similarly, if 3 < root, then 3 < root < nodes[right_subtree].

This is a very useful information. If we think of the problem recursively, then

total_smaller_num_count

= smaller_node_count_in_tree(root, 3)

= left_subtree_node_count + 1 + smaller_node_count_in_tree(rightChild), if 3 > root.

= smaller_node_count_in_tree(leftChild), if 3 < root.

With this rough formula, we are now sure that task 1) can also be done with a binary-search tree in O(nlogn).

Pseudocode:

Insertion(root, num, smaller_count)

if num is larger than root’s num

smaller_count += left_node_count[root] + 1;

if right_node[root] is null

attach num to right_node[root].

else

Insertion(right_node[root], num, smaller_count);

right_node_count[root] += 1;

else

if left_node[root] is null

attach num to left_node[root];

else

Insertion(left_node[root], num, smaller_count);

left_node_count[root] +=1;

Design a stack which satisfies the push, pop, min with O(1)

Problem Description

Design a stack which satisfies the push, pop, min with O(1).

Analysis

Regular stack can easily fulfill the operation of push or pop with O(1).

However, an usual stack could not satisfy min with O(1).

If we use a single variable to storage the min value in the stack, it will cost O(n) to maintain the variable stores the min value when we push or pop value. (Need to scan the whole stack to find another min value.)

Thus, we use another stack to fulfill the min operation with O(1).

(The second stack stores the min value corresponds to the new push number.)

My Solution:

Push:

1. Push the number into the data stack as usual.

2. Compare the new push number with the top number in the min value stack.

(1)  If the min value stack is empty, just put the new push number into the min value stack.

(2) If the top number in the min value stack is smaller than the new push number, push the new push number into the min value stack.

(3) If the top number in the min value stack is larger than the new push number, copy the top number of the min value stack and push it into the min value stack.

 

For example,

If we push 2,6,4,1,5, the two stack (an usual stack and a stack store the min numbers) will be :

Stack A       Stack B(min value stack)

5                    1

1                    1

4                    2

6                    2

2                    2

 

Pop:

1. Pop the number into the data stack as usual.

2. Pop the top number in the min value stack.

 

Min:

Get the top number of the min value stack.

 

Longest Path Problem: to find a longest path of a directed acyclic graph

Problem Description

Suppose String A and B have the same length of m + 1, B can be attached to the end of A, if the first m characters of A are the same to the last m characters of B.

Given n strings in  the same length(m + 1 long), find out a longest attachment of these strings.

Analysis

We can split the problem into two parts: 1) attachment check for a given string. 2) find out a longest attachment.

 Attachment Check

*An Attachment can be made only if A and B have the same length of M + 1 and the last M elements of A are the same to the first M elements of B.

General way to do the M-element check that is easy but time consuming(O(n^2*M)).

A better way to do the M-element check is to use Trie.

Find out a longest attachment

Let’s thinking of the N given strings as N nodes. If B can be attached to the end of A, then node A has a oneway path to B.

Now, the problem has changed to finding a longest path of a directed acyclic graph.

General way to do that is easy but time consuming(O(n^2)). We can see lots of repetitive work during the O(n^2) process.

Let’s try Dynamic Programming(DP).

Define max_pre[i]: the maximum number of node[i]’s ancestors.

Then it is easy to have max_pre[child] = max(max_pre[child], max_pre[node] + 1).

It’s a top-down approach formula, which means it works only if parent nodes are done before child nodes.

So we’re going to apply this formula to the nodes in a Topological Ordering.

My Solution: Trie + Topological Ordering + Dynamic Programming

First, read the N given strings one by one.

Second, construct a Trie tree for the strings.

Third, create a list childs[N]. If node j can be attached to node i, then child[i]  has node j. Use Trie to speed up the Attachment Check.

Fourth, work out the topological ordering of the graph, then apply the DP formula.

Longest Increasing Problem

Longest Increasing problem:
The problem:
The original sequence is a[i]
i is the index of the numbers
n^2:
Construct an array :
length[j] is used to store the length of the longest increasing subsequence, and the last number of the subsequence is a[j].
j is the index of the numbers(which is the same as i)
When we want to find the value of length[j], we must scan the value of the length[0]……length[j-1], find the largest value and plus it with one and set it as the value of the length[j].
Thus,
We must use double loop to get the solution of LIS.
The first loop is from the first number of sequence to the last number of sequence, which is used to fill the value of length array.
Because if we want to find the particular length[j], we must have another loop to scan the value of the length[0]……length[j-1].
The runtime will be O(n^2).

n*log(n):
Just the same as above, the original sequence is a[i]
I is the index of the numbers.
Construct another sequence.
Because the length of the sequence we use in this algorithm is unfixed, we use a vector to represent the sequence.
The sequence we construct is b[j]
Here j has two meanings:
(1) J is the index of this sequence. b[j] is the last number of subsequence we find.(not the original sequence)
(2) j is the length of the subsequence whose last number is b[j].(we can get more information from the explanation below )

Which means:

b[0] : the length of the increasing subsequence is 1.
b[1] : the length of the increasing subsequence is 2.
b[2]: the length of the increasing subsequence is 3
.
.
.
.
b[j]: the length of the increasing subsequence is j+1.
And the length of sequence b(b[0]……b[j]) is the length of the longest increasing subsequence.(This is the result of the LIS problem).

Because:
If the length of the two subsequence we find are same, the subsequence whose last number is smaller has much chance to form another increasing subsequence.
For example:
Subsequence 1: 0 1 8
Subsequence 2: 0 1 2
Obviously, the length of subsequence 2 is the same as subsequence 1. But the subsequence 2 has much chance to form another increasing subsequence.
For instance,
We scan the original sequence one by one.
If we find the number ”4”, we could not put the number”4” at the end of the subsequence “0 1 8”.
Because “0 1 8 4” is not an increasing subsequence.
But “0 1 2 4” is a new increasing subsequence whose length is 4.

Assume that:
The last number in an increasing subsequence Sa is larger than the another increasing subsequence Sb
The continue numbers we scan which can form another new longer increasing subsequence with Sa can also form a new longer increasing subsequence with Sb.

Thus,
Here the sequence b (b[0]…….b[j]):
J represents the length of the subsequence we find.
B[j] is the smallest last number of the subsequence whose length is j.

Solution:
After we understood the exact meaning of the sequence b, we can construct the solution.
We scan the original sequence a (a[0]……a[j]) from the beginning to the end of the sequence.
We can scan the sequence b:
When b[j] is smaller than a[i], it means that the subsequence b[j] represent can form a new subsequence with a[i].
So we should find an index j, which :
b[j]<a[i] and b[j+1]>a[i].
And then we replace the b[j+1] with a[i].
If we could not satisfy the condition above, we put a[i] at the end of the sequence b
So b sequence will still keep its characteristics.

Thus, we can scan the number in the original sequence one by one from the beginning to the end and try to construct the whole sequence b. The length of the sequence b will be the answer of the LIS.

The sequence b must be in increasing order.
Prove:
The method we construct the sequence b
(1) b[j]<a[i], b[j+1]>a[i], we replace the b[j+1] with a[i].
(2) if we could not satisfy the condition (1), we can put a[i] at the end of sequence b

Assume sequence b is in order at the beginning,
(1)
If we satisfy the b[j+1]>a[i],
because sequence b is in order, b[j+2]>b[j+1]>a[i],even if b[j+1] is replaced with a[i], the sequence b will still be in increasing order.
(2) Of course put the largest one at the end of the sequence b will not change the increasing order of sequence b.

Now we prove that sequence b will be in increasing order.
Thus,
we do not need to scan the whole sequence, we can use binary search.
If we use the binary search, we could degrade the second loop to O(log n).

So the total algorithm will also use double loop. The first loop is n, the second loop is logn.
And the total algorithm will be O(nlogn).

Attention:
The sequence b is not the subsequence we find. If we use this algorithm, we can only find the longest length, we could not know the content of the longest increasing subsequence.