F.Y.I

My materials for code interview:

You can have a look on the latest interview questions from careercup.com:
http://www.careercup.com/

And practice coding questions on Leetcode’s online judgement:
http://leetcode.com/

Here are my answers and summary for leetcode’s question on github:
https://github.com/benyl/leetcode

Here are some books that worth reading:

* CareerCup Top 150 Questions
* Cracking the Coding Interview
* C++Primer
* Head First Design Patterns

================================================================
Steps to solve algorithm problem on board:
================================================================

0) Collect detail requirements about the question: data structure, limitation, input types, error handling etc

1) Describe your solution by a simple case step by step.

2) Write your code base on the simple case.

3) “Mind compile” your code, check with corner cases.

4) Give the time & space complexity of your solution, think of ways to optimize them.

================================================================
And some useful knowledge (Good to know but not required):
================================================================

Sample coding questions from Google:
http://basicalgos.blogspot.com/?view=flipcard

如何设计一个LRU Cache
http://blog.csdn.net/hexinuaa/article/details/6630384

从B树、B+树、B*树谈到R 树
http://blog.csdn.net/v_july_v/article/details/6530142

Morris method (no recursive, no stack, O(1) space)
http://www.cnblogs.com/AnnieKim/archive/2013/06/15/MorrisTraversal.html

EXACT STRING MATCHING ALGORITHMS
http://www-igm.univ-mlv.fr/~lecroq/string/index.html

Dynamic Programming: 背包問題九講

Click to access DP.pdf

OOP questions (in java):
http://www.tutorials4u.net/java-faqs/core-java/index.html

教你如何迅速秒杀掉:99%的海量数据处理面试题
http://blog.csdn.net/v_july_v/article/details/7382693

从Trie树(字典树)谈到后缀树
http://blog.csdn.net/v_july_v/article/details/6897097

FB面试准备,这个人写的很详细,我觉得对其他面试也适用
http://www.mitbbs.com/article_t/JobHunting/32741713.html

大型WEB系统架构基本设计思路
http://www.it610.com/article/2619667.htm

增加你在美国找工作获得内推机会的10条建议
http://www.1point3acres.com/how-to-improve-your-chance-of-getting-referrals/

Get that job at Facebook
https://www.facebook.com/notes/facebook-engineering/get-that-job-at-facebook/10150964382448920

Facebook: How to crush your coding interview

系统设计总结:
http://www.mitbbs.com/article_t/JobHunting/32777529.html
http://www.wtoutiao.com/p/ef9tYN.html

================================================================
And something beyond algorithm:
================================================================

面试心仪的公司以前,我建议多面多几家公司,积累面试经验,你最起码要有一下体会:
1)面试提问千奇百怪你都遇到过,但你遇到没见过的问题不慌张。
2)现场面试过几家公司,特别是大公司。
3)足够强的体力和精神力挺过一天的面试。
4)自我介绍和项目经历倒背如流。
5)已经总结出一套跟面试官交流的经验,有几个常问的给面试官的问题。
6)算法基础扎实,对算法题有自己总结的一套经验和思路,面试前不盲目看网上的新题。熟练使用各种数据结构和优化思路,现场面对没遇过的算法题能够通过已有的知识一步一步推导。
7)对于有工作经验的工程师,最好能有自己的一套应对系统设计问题的思路,大致了解网站/分布式系统/移动开发的架构,能把设计题往自己熟悉的领域扩展,拉到自己的主场来分析。
8)关于系统设计,前端开发要考虑API设计,刷新机制,后端开发则需要考虑数据存储,状态更新,通知系统,分布式系统。最好你能有自己的一套理论归纳。简单来说就是减少数据量传输,状态更新的提示,用内存换时间性能、分布式系统的分配问题、负载均衡、数据同步。九章算法的课挺有用,mit bbs也有不错的系统设计总结,有个blog大概叫做a dummy guide to scalebilty写得就非常好,HighScalability这个网站也不错。还有一些网上的资源可以找找看。

这些不是说你特意准备以上几点,而是面试得多了,上面的几点你自然而然的就有了,你就知道自己准备充足了。不然通过的几率会比较低……

================================================================
关于OJ:
================================================================
用OJ训练算法基础是很有效的方法,现在有不少值得使用的OJ系统。我的GitHub (https://github.com/benyl/leetcode) 有当时全部leetcode解答和分析(130题),还有一个excel表格评价各题难度和归类,建议起码把3星及以下题目全部做完,和每一个类别做一个总结。Leetcode的题目我建议第一遍把题目做出来,第二遍精简代码,参考他人的答案,用尽可能简单和少的代码解决问题,这样对你的代码风格有很大帮助,注意,简单的代码不一定是最短的。

================================================================
关于读书时找工作的时间表:
================================================================
通常来说暑假实习需要提早一个学期投简历,前一年的9~12月份就应该一边读书考试,一边参加宣讲会和career fair,找认识的学长学姐内推,以及准备面试。
等到12月底学期结束,11~2月期间大公司面试完毕,没有就坚持到3~4月看能不能捞到一个暑假实习,不然就继续找。如果1、2月才开始找暑假实习,你基本上已经迟了(大公司intern head count很可能已经满了)。
等到春季学期结束,开始暑假实习,同时开始准备毕业和投简历找全职工作。
毕业前找全职工作的时间安排跟找实习差不多,同时有没有好的暑假实习经验也有很大影响。11月~1月面试完大公司。同上,1、2月才开始找的人基本上慢了一截。
最后学期继续一边上课一边投简历找全职工作,同时申请学生临时工作身份OPT。
如果工作不满意,可以继续一边工作一边准备面试更好的公司。
To be continue……

Good luck!

– Poly-Thinking

Tips for learning a new programming language (when you already know one)

Reference from here:
tips-for-learning-new-programming by Hanneli Tavante

  1. Install simplest environment
  2. Procedural Programming basics — creating and printing variables, creating and calling basic functions.
  3. Object Oriented Programming (OOP) basics — create a class, add attributes, methods and create an object.
  4. Test Driven Development (TDD) basics — Write a test –> See it red –> Do the minimal to see it green -> Refactor with OOP.
  5. OOP principles — relationships between classes and objects, Abstraction, Polymorphism, Inheritance, Encapsulation and design patterns.
  6. Make analogies and compare between the language you already know and the language you are learning.
  7. Pair programming — take a simple problem, implement it in pair programming with someone that has experience on the language you are trying to learn.

For functional programming language, OOP may not work, you can find out similar way to learn it.

System Design: Mutual Exclusion on Distributed Systems

Scenario: x = 10. At a moment, machine A executes “x *= 10” while machine B executes “x +=10”. Either x=110 or x = 200 is correct. Unfortunately, x = 100 or x = 20 will be the result without coordination between A and B.

Solution1-Centralized Coordinator: C is the manager and only grants permission to one machine at a time. Everything is good except that C is the bottle-neck on performance and failures.

Solution2-Token Ring. Machines are linked up as one cycle/ring. A token(permission) is passed along the ring. Machine i acquires permission by waiting for the token arrival from its left node(i-1). Machine i release permission by passing the token to its right node(i+1). The good is no one-point failure in this model. But it requires O(n) messaging transmittion.

Solution3-Lamport’s Bakery Algorithm: please see wiki.

System Design: Endianness

Q: Why on earth we have endianness?

A: It’s a matter of hardware. Computer memory is just a sequence of cells which are just bytes. A 32-bit processor has 32-bit registers that store 32/8=4 bytes. Endianness happens when data is loaded/stored between registers and memory. Big-endian stores the high-byte(significant byte) in registers as the low byte in memory. Little-endian does the opposite.

Q: Why couldn’t the world agree on one universal standard of endianness?

Each endian has its own strengths in specfic application domains. Internet protocols/Telephone network are standardized with Big-endian. Just like IP addresses, the high-order byte/number is sent first as the area code. Little-endian simplifies arithmetic calculation for assembly programming because we humans start at the lower digit when we do add/minus/divide. Also, with little-endian, reading a 64-bit number is the same as reading 32-bit number(still the calculation order).

Distributed Systems: Consistency Model

First, let’s study the formal definition from LESLIE LAMPORT.

A high-speed processor might execute program operations in a different order. A Sequential Processor guarantees that the result of an execution is the same as if the program is executed in the order specified by the program.

A Sequentially-Consistent multi-process algorithm guarantees that the result of an execution is the same as if operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by the program.

In other words, reordering program operations is a common approach for code optimization(i.e. load operations are issued early because it is slow). Both compilers and processors like to do that. Sequential Consistency guarantees the correctness of such a  reordering.

To sum up the key points of Sequential Consistency for further reference,

  • Each process/processor issues memory requests in the order specified by the program.
  • All memory requests to an individual memory location are served FIFO.

Notes: memory requests(usualy load/store) are all we concern because they are visible across different processes/processors.

System Design: A Thread Pool

Q: How does server serve requests?
A: A server assigns each client request with a dedicated thread. Those threads can be created on demand or be maintained as a single thread pool to be reused. The idea of the thread pool pattern is that thread-creation and destruction are expensive overheads compared to thread-reuse.

Q: How does a thread-pool work?
A thread pool is created with a parameter x indicating the default size of the pool. The size x can be dynamically tuned according to the service load. Each worker thread stays asleep and be waked up when requests arrive. Requests come in and are enqueued FIFO. Moveover, a most-recently active thread might has a high priority to be reused because of “hot” cache locality.

Q: What would happen if x is not tuned properly?
Suppose x = 10 and y = 100(the number of requests have arrived). This leads to heavy context-switchs, wasting CPU times. Suppose x = 100 and y = 100. This leads to a waste of memory.

Q: What else is needed for implementing a thread pool?
A thread-safe queue and a clear understanding of semaphores.

Algorithms: Longest Increasing Subsequence

*LIS is a quite typical concept/problem in dynamic programming and I can’t help to forget the nlogn solution again and again. Hopefully this article would stamp it on my brain.

Description: find the longest increasing subsequence. i.e. the LIS of { 1, 8, 5, 25, 10, 24, 19 } is {1,5,10,19}.

Solution: There are some attitudes to keep in mind for a optimal solution.

  • The word “sequence” implys a incremental solution.
  • Among all possible LISs of length L, we only consider/build upon the most extendible one(in this case, the one ending with the smallest value. A Greedy attitude)
  • Also, when a new number incrementally added to the end of candidate sequence, we can update the solution set in O(logn) using Binary Search. (quite a invisible property)

Leetcode: Distinct Subsequences

Given two strings A and B, find out how many distinct subsequences in A that are exactly equal to B.

Let get close to what is a subsequence.

If S is a subsequence of B, then each element of S can be assigned with an index at B. A valid assignment is that indices of B’s elements aren’t overlapping and in an ascending order.

Then we can come up with a naive solution.

  1. for_each(S.begin(), S.end(), assign(char ch, T, range) {…}

Notice that the range parameter represents the constraint that in a valid assignment, indices are in ascending order.

Now this isn’t what all we are looking for. The question requires the number of valid distinct subsequences we can find in T. We will traverse all possible states in order to produce the correct number.

strawmen solution 2: use backtracking to traverse all possible states.

Let’s get close to the details in this solution. Implementing the backtracking algorithm would be the best way to achieve that purpose.

class Solution {
public:
    int numDistinct(string S, string T) {
        return BT(S.c_str(), T.c_str());
    }

    int BT(const char* s, const char* t) {
        if (*t == '\0') return 1;
        if (*s == '\0') return 0;
        if (*s == *t) 
            return BT(s+1, t+1) + BT(s+1, t);
        return BT(s+1, t);
    }
};

This can’t be wrong because BT is THE WAY to traverse all possible states! But hold on, what is the running time of this BT? The question is equivalent to how many sequences we traverse. And the answer is O(C(n, m)), where n is the length of the original string and m is the length of the target subsequence. That is almost n sqrt, which doesn’t seem acceptable for strings processing(strings are large in real life). Does any state have some nice properties that cause the same results, or are there any duplicate states??

Strawman solution 3: 

each call to BT() represents a unique intermediate state. It’s not hard to tell that in the overall BT process, there are duplicate intermediate states. Recomputation on duplicate states are supposed to be cut out by just simply utilizing the previous result.

class Solution {
    vector<vector<int>> mem;

public:
    int numDistinct(string S, string T) {
        mem = vector<vector<int>>(S.length(), vector<int>(T.length(), -1));
        return BT(S, T, 0, 0);
    }

    int BT(string& S, string&T, int si, int ti) {
        // a valid assignment has found
        if (ti == T.length()) {
            return 1;
        }
        // reach the end, no valid assignment found.
        if (si == S.length()) {
            return 0;
        }
        // duplicate states?
        if (mem[si][ti] > -1)
            return mem[si][ti];

        // try to match S[si] with T[ti]
        int sum = 0;
        if (S[si] == T[ti]) {
            sum += BT(S, T, si+1, ti+1);
        }
        // try to match T[ti] with some later element in S.
        sum += BT(S, T, si+1, ti);
        mem[si][ti] = sum;
        return sum;
    }
};

Concurrency: Mutex vs Spinlock

Mutex

Differences between mutexs and spinlocks.

  • If a thread failed to lock a mutex, it gets on waitlist and goes sleep.
  • If a thread failed in a spinlock, it busywait for the lock to release.

It essentially is trade-off between context-switch and busy-wait. Spinlocks are good for short-period wait while mutexs are good for long-period wait.

reference http://stackoverflow.com/questions/5869825/when-should-one-use-a-spinlock-instead-of-mutex