Saturday, 27 October 2012

String searching problem

Given a TEXT string say "abcmanmfaxyzabcn" and a search string "man". Find the number of occurrences of the SEARCH string "man" in the given text string. The alphabets of the search string can be separated by other characters in between them in the text string e.g. "man" occurs twice in "mabcayzn" with 'a' occurring at two different positions.

Solution:
  1. First build a boolean array of 26 chars and store the chars which occur in the SEARCH string.
  2. Next remove all characters from the TEXT string which don't occur in the SEARCH string using the above array. This allows us to reduce the problem size by removing unwanted characters from TEXT string.
  3. The first approach i gave to the interviewer was to use a recursive function, which passes 2 arguments to the function findSubstring(String text, String search) and this searches search[0] in all the places in text and then calls the function on the remaining subproblem. However the time complexity of this solution is exponential.
  4. Next i came up with a dynamic programming solution. In this we create a matrix res[m][n] with TEXT string chars as the columns and the SEARCH string chars as the rows. Now we calculate  res[i][j] as:
res[i][j] =   0  if SEARCH[i] != TEXT[j]
                          Sum of all res[i-1][k] for all  0<k<j

The time complexity of this solution is O(mn) and is a lot faster than the recursive solution.

Design a timer class

Design a timer class. The users of your class should be able to register to your class with the time in secs and their callbacks. These callbacks are called as the time expires.


  • The timer runs in its own thread, and it sleeps for 1 sec (since, the interviewer had asked for 1 second as the granularity)
  • It maintains a list of  'Task/Runnable' along with the remaining time to expire.
  • Every time the timer wakes up after a sec, it reduces the expiry time for each task by 1 sec.
  • And for all those tasks for which the expiry reaches zero, their callbacks are called.
  • I suggested a faster way (where one doesn't need to iterate the list of tasks) if the timer was limited to a fixed number of seconds using an circular array of seconds.

Linked list Duplication

Given a linked list which has the node structure as shown below:

class Node{
  Node val; // note that the value also randomly points to some other node in the LL
  Node next;
}

How would you duplicate this linked list ?

Hint: I used an external hashmap DS in my solution.

Better solution:
http://www.geeksforgeeks.org/a-linked-list-with-next-and-arbit-pointer/

Microsoft Interview Question

Given two numbers as a linked list of digits, write a program to add the numbers and return the result in the new linked list.

Microsoft Interview Questions


  1. Make the mirror image of the given binary tree.
  2. Design the data structure and algorithm to help in spell correction of words. The correct words are given in alphabetical order in the dictionary file.
  3. Write a function which tests whether the given binary tree is a binary search tree or not?
  4. Merge two given BSTs in linear time.
  5. Using getchar() take user input until user presses Enter key. And then print the user input in the reverse order.

Adobe Interview Questions


  1. A rectangle is cut from another rectangle (any orientation). Find the line which divides both rectangles into 2 equal halves.
  2. Do the tree traversal level by level. After each level print the new line i.e. ‘\n’.
  3. Solve the 12 coin problem to find the different coin in 3 weightings.
  4. A king has 1000 barrels of wine, out of which one is poisoned. Use min. no of prisoners to identify the poisoned bottle.
  5. Modified tower of Hanoi. Every disk move is through an intermediate bar.
  6. Find the no. of ways in which the sum of coins can be obtained using the coins x,y,z.
  7. Sorting of very large file, which does not fit in memory.

Google Interview Questions


  1. You are given a sorted array (a[]) and a value sum. Find the pairs a[i] and a[j] in the sorted array s.t. a[i] + a[j] = sum. Your algorithm should be linear in time and space.
  2. In the tennis tournament find the second best player.

Monday, 8 October 2012

Find all contiguous sub-arrays whose sum is zero.

For example:

i/p: 1,2,3,-3,-2,-1
o/p: [2,3], [1,4], [0,5]

You can do a running sum of all the array elements to get the tmp[]:
=> 1 3 6 3 1 0

Now in this array you need to find all duplicates,triplets etc.. All such pairs represent indices of numbers between which the sum of sub-array is zero.

Now to find the duplicates, we can store the tmp[] values as the key in the hashmap with the array index as the values. So, all indices which fall to same key gives us the solution.

Design data structure for merging meeting schedules

Add two numbers without using arithmetic operators

Store an N-ary tree structure in Relational database table

I was asked this question during my tower research interview for the position of Senior script developer.
I came up with the solution to store it in a self referring table which has two columns (parent,child). So, in this kind of table he said that finding the depth of any particular node, i would have to write/call queries equaling the depth of the node under consideration. He asked how can you improve the data structure to store this kind of structure in tables ??

http://iamcam.wordpress.com/2006/03/24/storing-hierarchical-data-in-a-database-part-2a-modified-preorder-tree-traversal/
http://www.sitepoint.com/hierarchical-data-database-2/

Other questions:
Linux:
Find all files open under a process -> ls -l /proc/{PID}/fd
How to setup a password-less SSH
Remove the soft-links in a directory and replace them with the copy of the original file.
List all files in a directory recursively which are not owned by a particular user.

Array Based:
Given 2K+1 numbers out of which K numbers occur twice each and one number occurs just once, find this one time occurring number. -> used XOR
Given numbers from 1 to K with 2 numbers missing, find the 2 missing numbers. -> used XOR based partitions.

Java:
When is finally called. What if you return from inside try and have a return inside finally as well ? What if finally throws an exception ?
About the synchronized() keyword.

Database:
How would you store a tree in the relational tables?

Interviewer: Manikant K.

How to design a web forum

This was asked to me as a design question in amazon telephonic round. The interviewer wanted to know the DB design and the web-app design for a forum which supports around 100K users with the following use-cases:


 -Sign up
- Posting
- Replying
- Searching [keywords + dates based]

I told him we could have following DB schemas:


Thread [tid, subject, date]
Post [pid, tid, uid, msg, datetime ]
User [uid, role, emailid, password_hash ...]
Role [rid, list flags(for permisssion etc.)...]

The plain DB solves the use cases for Sign-up, posting and replying but could become a bottle neck for search if we run a search query based on query in the Post table (with a where clause as 'like "%keyword%"). So, for improving the performance of this, i suggested running an independent background process which parses the msgs in the post table and stores the important keywords from this messsage in the  independent datastore/B-Tree structure to be able to index the keywords for faster searches. The corresponding value for the keywords is the 'pid' (post id) in the table. This should improve the performance of the query. the interviewer seemed to have liked the proposal. Let me know if you guys can think of a better strategy for faster searches.

After this he wanted me to move to the Java side and explain the design there. I told that we would make similar classes/objects like the tables above, define relationship between those classes and then define the operations on these classes.
For components, I told him i would use the MVC architecture, where the views would be JSPs and Controller would be the interacting servlets which would extract data from the views, call the business logic from the Model and then fill the results in a new view which is returned to the user. He asked me to delve a little deeper into the Model components. I told this would basically have two main layers one is DAO (which interacts with the DB) and other is the service layer (which is called by the controller and internally calls the DAO classes for DB interaction). The service classes would be using the above set of objects.
PS: I wasn't very comfortable with the component design part, so any comments are welcome.

Sunday, 7 October 2012

Chat server based on Apache Mina

Initially thought of making a chat server based on tomcat, but came across apache mina and apache async-web products which seem to be exciting and more scalable as well.

http://mina.apache.org/

Configuring the maximum no. of request threads for the tomcat server

Reference: http://tomcat-configure.blogspot.in/2009/01/tomcat-maxthreads-configuration.html
Reading: http://tomcat.apache.org/tomcat-5.5-doc/config/http.html
http://tomcat.apache.org/tomcat-6.0-doc/aio.html

Tomcat maxThreads configuration

Tomcat maxThreads represents the maximum number of request
processing threads to be created by the HTTPConnector.



<Connector port="8080" address="localhost"     
     maxThreads="250" maxHttpHeaderSize="8192"
     emptySessionPath="true" protocol="HTTP/1.1"
     enableLookups="false" redirectPort="8443" acceptCount="100"
     connectionTimeout="20000" disableUploadTimeout="true" />





This determines the maximum number of simultaneous requests that can be handled. If not specified, this attribute is set to the default value of 200.

How the process works:



  • At server startup, the HTTP Connector will create a number of processing threads based on the value configured for theminSpareThreads attribute.
  • Each incoming request requires a thread for the duration of that request.
  • If the number of simultaneous requests cannot be handled by the currently available request processing threads, additional threads will be created up to the configured maximum (the value of the maxThreads attribute).
  • If still more simultaneous requests are received, they are stacked up up to the configured maximum (the value of the acceptCount attribute).
  • Any further simultaneous requests will receive "connection refused" errors, until resources are available to process them.
Guidelines for maxThreads:
maxThreads is an important tuning parameter, however if you are reaching an error like:
org.apache.tomcat.util.threads.ThreadPool logFull SEVERE: All threads (150) are
currently busy, waiting. Increase maxThreads (150) or check the servlet status
you should at first investigate if it's rather a problem of individual requests taking too long: are your threads returning to the pool? if, for example, database connections are not released, threads pile up waiting to obtain a database connection thereby making it impossible to process additional requests. This is a problem in your webapp.
Take a thread dump to find out where they're stuck. Increasing too much maxThreads will lead to :
  • Consume a good chunk of memory.
  • Your system will spend too much time context switching
So once you have already optimized your application try raising you maxThread attribute up to 500-750. I wouldn't advice to create larger Connectors, rather if 750 Connections are not enough create a Cluster configuration with several Tomcat instances. For example 2 instances of tomcat each one with maxThreads=500 instead of a single Tomcat with maxThreads=1000
If you want to learn more about JDK tuning read this tutorial which contains many tips valid also for tomcat:
http://www.mastertheboss.com/en/jboss-application-server/113-jboss-performance-tuning-1.html

Wednesday, 29 August 2012

Next Greater Element

Given an array, print the Next Greater Element (NGE) for every element. The Next greater Element for an element x is the first greater element on the right side of x in array. Elements for which no greater element exist, consider next greater element as -1.

Amazing solution in the link below, based on a stack for O(n) time. I just loved it :P

Reading:http://www.geeksforgeeks.org/archives/8405

Tuesday, 28 August 2012

How to find the longest palindrome in a string in linear time

Solution:

Probably finding the LCS (longest common sub-sequence) between the string and its reverse.

anand tech blog ??

Microsoft Interview- Two elements of BST are swapped by mistake. You have to restore the tree without changing its structure.

Two elements of BST are swapped by mistake. You have to restore the tree without changing its structure.

Solution:

// incomplete solution

Node fixSwappedBST ( Node r, int min, int max, Node toFix){

// r - root node, r.val >= min && r.val <= max  and toFix is the node which needs to be fixed and was at a wrong   place in the left subtree of the common ancestor (of the swapped nodes)

 if(r==null) return null;
 if(r.val<min || r.val > max){ // wrong node, so fix it
   if(toFix != null) { // swap the nodes  
    int tmp = toFix.val; toFix.val=r.val; r.val=tmp;
   }
   else{ toFix = r; }
   fixSwappedBST(r.left, min..
 }
else{
  fixSwappedBST(r.left,  min, r.val,toFix);
  fixSwappedBST(r.right,r.val, max,toFix);
 }
}


Another approach:

Do a Pre-order traversal of the BST. It would be wrong at two places, identify those two positions in the Pre-order traversal. Identify the first element of the pair where it is out of order and the last element of the pair where it is again out of order.

In the next traversal, swap these two nodes. Would this work ??
Solution: http://www.geeksforgeeks.org/archives/23616
Reference: http://www.geeksforgeeks.org/archives/23088

Given a linked list containing 0s,1s or 2s. Sort it.

Given a linked list containing 0s,1s or 2s. Sort it.

Solution:
Maintain three different linked lists each having its own start pointer (LL0, LL1, LL2)  and keep adding the elements to the respective lists.
After the traversal you have three linkedlists (LL0, LL1 and LL2), now just join LL1 to the end of LL0 and LL2 to the end of LL1.

Another approach:
Keep three counters (c0,c1 and c2) and in the end reset the linked list elements as per the counters.

http://www.geeksforgeeks.org/archives/23088

Friday, 24 August 2012

Recursive function to check a string for palindrome

boolean isPalindrome(String s){

  if(s.length()<=1) return true;

  if(s.charAt(0) == s.charAt(s.length()-1)) return isPalindrome(s.substring(1,s.length()-2));
  else return false;

}

Thursday, 23 August 2012

Handling concurrency in a Web application


To handle concurrency in a web site a common practice it to have a column on each record that allows you to check it has not been updated since you got it. Either last update date or a sequential version number (auto incremented by a trigger).
Typically you will read the data (plus the concurrency column)
SELECT seat,etc,version_no
FROM t1
WHERE column = a_value
Then when the user eventually gets round to booking the seat the update will work unless there has been an update.
(the version number or update date will change after every update)
BEGIN
    UPDATE t1
    SET seatTaken = true
    WHERE seatid = .....
    AND version_no = p_version
    RETURNING version_no INTO p_version;
EXCEPTION WHEN NOT_FOUND THEN
    --Generate a custom exception 
    --concurrency viloation the record has been updated alreadyEND;
the trigger to auto update the version number would look a little like this
CREATE OR REPLACE TRIGGER t1_version
AFTER INSERT OR UPDATE ON t1
FOR EACH ROWBEGIN
    IF :new.version_no IS NULL THEN
       :new.version_no  := 0;
    ELSE
       :new.version_no  := :old.version_no  + 1;
    END IF;
END;

Saturday, 18 August 2012

How to design a good API

From  a talk by Joshua Bloch: http://www.youtube.com/watch?v=aAb7hSCtvGw
Slides: http://lcsd05.cs.tamu.edu/slides/keynote.pdf

  • API design is very similar to Language and language feature design.
  • Easy to evolve, hard to misuse
  • Gather requirements with healthy degree of skepticism. Define use cases from the requirements. Often it is easier and more rewarding to solve a more general form of problem than solving a specific problem so keep your mind open.
  • Keep the initial spec of the API short/1 page. And discuss with as many people and then flesh it out as confidence builds.
  • Write to the API (means write clients using API interface) as you design the API, this is very important as it allows you to explore the use cases API is being designed to solve.
  • API should do ONE thing good.


Sunday, 12 August 2012

State Design Pattern

Read the chapter on State design patterns from Head First Design Patterns book.

They explain the GumBallMachine, which can be in different states and needs to perform each action in a different way depending on the current state.

To start with they declare the action() methods in the GumBallMachine class only and in each action method they use if-else ladder to check the current state and take action based on the current state. And show how this type of state machine coding makes the action methods cluttered.

And in the next approach they make a State interface which has list of common action methods. And then there are concrete state classes for each state, which implement this interface and perform the actions() depending on the current state(i.e. the class for the state). This allows you to store the current_state object in the GumBallMachine class and call the state.action() methods inside this class. Depending on the current state the apt methods are called for each action(). This decouples each state from other states and makes adding new states easier.

Saturday, 11 August 2012

Find the longest increasing subsequence

Its a DP problem:

Let L[j] be the length of the longest increasing sub sequence including jth array element.
=> L[i] = MAX(L[j]+1) for all (j<i)&&(a[j]<=a[i]).

O(n^2) solution.

http://www.geeksforgeeks.org/archives/12832

Friday, 3 August 2012

Maximum size square sub-matrix with all 1s

Maximum Product Subarray

Given an array that contains both positive and negative integers, find the product of the maximum product subarray. Expected Time complexity is O(n) and only O(1) extra space can be used.


Solution: 
I. The solution that came to my mind first is to "Use a way similar to Kadane's algorithm for finding the maximum sum of a sub array.", but in this case the when (prd<0) we can't just discard it like we do it fro sum, bcoz a future -ve number might make this the highest product :P.
We would have to keep an integer storing the last_neg_num_prod to store the last -ve product upto a point and then set (prd=1).


public static void main(String[] args) {
int a[] = {-2, -3, 0, -2, -40};
int max= Integer.MIN_VALUE, last_neg_prod=0, prd= 1;
for(int i=0;i<a.length;i++){
prd = prd * a[i];
out:
if(prd<0){
if(last_neg_prod<0){ // we have an old -ve product, we can combine these to get a higher +ve product then
prd = prd*last_neg_prod;
last_neg_prod = 0;
break out;
}
last_neg_prod = prd;
prd = 1;
}
else if(prd ==0){
last_neg_prod = 0;
prd=1;
}
if(max<prd){
max = prd;
}
}

System.out.println("max prd is:"+max);


}



source: http://www.geeksforgeeks.org/archives/22216

KMP String searching algorithm

KMP is an algorithm to find the occurrence of string 'W' in another string 'S'.

1. Creates a partial match table T. T[i] stores the length of the longest proper suffix of W[0..(i-1)] which is a proper prefix of W.

How T is used in the main KMP algorithm:
At any given time, the algorithm is in a state determined by two integers, m and i, which denote respectively the position within S which is the beginning of a prospective match for W, and the index in W denoting the character currently under consideration.


T indicates where we need to look for the start of a new match in the event that the current one ends in a mismatch. The entries of T are constructed so that if we have a match starting at S[m] that fails when comparing S[m + i] to W[i], then the next possible match will start at index m + i - T[i] in S (that is, T[i] is the amount of "backtracking" we need to do after a mismatch).

http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm#Efficiency_of_the_KMP_algorithm

Generate the list of ugly numbers

Sunday, 8 July 2012

How to make the session in a Web Application secure


  • Using HTTPS to transfer session id and cookies as well. (If cookie is transferred over HTTP it can be easily stolen and used for impersonation)
  • Sending an entropic unique token in the hidden field in the page. The server authenticates the user based on this token, otherwise rejects the user request.
  • Session id/token should be unpredictable. Could be generated using java.security.SecureRandom class in Java or as explained in the post: http://technocratme.blogspot.in/2012/04/universally-unique-identifier-uuid.html
  • To confirm the authenticity of the sender of request, one can compare the sender's IP address with the IP of the user who created the session.

Prime No., GCD, LCM calculations


// Calculate all primes upto N, set the boolean to true for all composites

  boolean isComposite[] = new boolean[N+1];
// calculate all primes upto N
for(int i=2;i*i<=N;i++){
if(!isComposite[i]){ // if prime
for(int j=i*i;j<=N;j+=i){
isComposite[j] = true;
}
}
}

// Calculate the GCD of 2 no.s using recursive euclidean algorithm

  private static long gcd(int a, int b) {
if(a<b){
// swap numbers
a = a^b;
b = b^a;
a = a^b;
}
if(b == 0)
return a;
return gcd(b,a%b);
}

private static long lcm(int a, int b) {
return a*b/gcd(a,b);
}


EQUATIONS



Solution:
1/x + 1/y = 1/N!
=> x = y*N!/(y-N!)

Say y = N!+Z,
=> x = (N!+Z)*N! / Z = N!^2 / Z + N!.
For 'x' to be integer, 'Z' can take values equal to the no. of divisors of N!^2.

No .of divisors is calculated in the following manner: http://technocratme.blogspot.in/2012/07/divisor-counting.html

import java.util.Scanner;

public class EQUATIONS {

public static int getPowerOfPrimeInNFact(int N,int prime){
int pow = N/prime;
int tmp = 0;
while(pow>0){
tmp +=pow;
pow = pow/prime;
}
return tmp;
}
public static void main(String args[]) throws Exception {
Scanner in = new Scanner((System.in));
//System.out.println(getPowerOfPrimeInNFact(18,3));
int N = in.nextInt();
boolean isComposite[] = new boolean[1000001];
int [] powPrimes = new int[1000001];
// calculate all primes in first 1 million
for(int i=2;i<=1000;i++){
if(!isComposite[i]){ // if prime
for(int j=i*i;j<=1000000;j+=i){
isComposite[j] = true;
}
}
}
long no_div = 1;
for(int i=2;i<=N;i++){
if(!isComposite[i]){ // if prime
//System.out.println(i);
// Find powers of all primes
powPrimes[i] = getPowerOfPrimeInNFact(N,i);
// find the continual mod of the tot. no. of divisors of N!*N!
no_div = ((2*powPrimes[i]+1)*no_div)%1000007;
}
}
System.out.println(no_div);
}
}

Saturday, 7 July 2012

Daemon vs Non Daemon thread in Java

 A Thread can be made a daemon, which makes it run in the background. The latter also affects VM termination behavior: the VM does not terminate automatically as long as there are non-daemon threads running.
A daemon Thread only runs as long as there are non-daemon Threads running. When the last non-daemon Thread ends, the whole program ends no matter if it had daemon Threads still running or not; i.e. A running daemon thread can be terminated by the VM, however a running non-daemon thread can not be terminated by the VM.

An example:
Imagine a Java application. It starts off with just one thread, the "main" thread which runs the main method. This is a regular (non-daemon) thread. Now imagine again that the program starts another thread which sits in a loop doing something. If the new thread is a regular (non-daemon) thread, it will prevent the program from finishing when "main" ends, and keep going forever!

Now, this isn't always what is required. Sometimes you want to end this "background" processing when the program finishes. To do this, you can mark threads as daemon threads. Daemon threads don't prevent the program from ending, but stop when the main thread, stops:

  1. Thread hello = new HelloThread();  
  2. hello.setDaemon(true);  



A classic example of a daemon thread is the garbage collector thread found in many Java Virtual Machines. It needs to run continuously while any other threads are running, but should not prevent the program from exiting. When the program exits, there is no more need for a garbage collector.


Making a jar runnable

This is based on using the MANIFEST.MF file inside the jar, which points to the 'Main class' of the jar and setting the class path too.

http://www.crazysquirrel.com/computing/java/basics/runnable-jar.jspx

Java Shutdown Hooks

 A shut down hook is simply a Thread that is registered with the JVM and run just before the JVM shuts down.


Example code taken from the below mentioned link:




package example;public class ShutdownHook {

  
/**
   * 
@param args
   */
  
public static void main(String[] args) {
    
Hook hook = new Hook();
    System.out.println
"Running Main Application..." );
    Runtime.getRuntime
().addShutdownHookhook );
    System.out.println
"Exiting." );
  
}

  
  
private static class Hook extends Thread {
    
public void run() {
      
System.out.println"Running Clean Up..." );
    
}
  }
}



http://www.crazysquirrel.com/computing/java/basics/java-shutdown-hooks.jspx

http://stackoverflow.com/questions/2921945/useful-example-of-a-shutdown-hook-in-java

Java Heap dumps and Thread dumps

THREAD DUMP
To take the thread dump of the running java process with process id = PID, use: "kill -3 PID".


The thread dump is written to the system out of the VM on which you executed the kill -3. If you are redirecting the console output of the JVM to a file, the thread dump will be in that file. If the JVM is running in an open console, then the thread dump will be displayed in its console.


You could alternatively use jstack (Included with JDK) to take a thread dump and write the output wherever you want

jstack PID > outfile



HEAP DUMP


You can use jdk utility 'jmap' to take the heap dump.


And use the jdk tool 'jhat' to analyse java heaps.


http://www.startux.de/java/45-java-heap-dump

Divisor counting

Count the number of divisors of a number given its prime factors in the form:


If a number n look like this:

n = a^p * b^q * c^r * d^s * e^t * f^u ... ,

where a,b, c and so on are its prime factors

then the number of divisors of n = (p+1)*(q+1)*(r+1)*...


This is because we can choose any number of a's from 0 to p, which gives us (p+1) choices, and so on.

http://mathforum.org/library/drmath/view/55741.html

String Similarity - Z function


For two strings A and B, we define the similarity of the strings to be the length of the longest prefix common to both strings. For example, the similarity of strings "abc" and "abd" is 2, while the similarity of strings "aaa" and "aaab" is 3.
Calculate the sum of similarities of a string S with each of it's suffixes.

Solution:
The solution is based on calculating the Z function of the string, where Z(i) is the maximum overlap length of the string S[0..] and S[i..].

Algorithm

Given a string S of length n, the Z Algorithm produces an array Z where Z[i] is the length of the longest substring starting from S[i]which is also a prefix of S, i.e. the maximum k such that S[j] = S[i + j] for all 0 ≤ j < k. Note that Z[i] = 0 means that S[0] ≠ S[i]. For easier terminology, we will refer to substrings which are also a prefix as prefix-substrings.

The algorithm relies on a single, crucial invariant. As we iterate over the letters in the string (index i from 1 to n - 1), we maintain an interval [L, R] which is the interval with maximum R such that 1 ≤ L ≤ i ≤ R and S[L...R] is a prefix-substring (if no such interval exists, just let L = R =  - 1). For i = 1, we can simply compute L and R by comparing S[0...] to S[1...]. Moreover, we also get Z[1] during this.

Now suppose we have the correct interval [L, R] for i - 1 and all of the Z values up to i - 1. We will compute Z[i] and the new [L, R]by the following steps:
  • If i > R, then there does not exist a prefix-substring of S that starts before i and ends at or after i. If such a substring existed, [L, R]would have been the interval for that substring rather than its current value. Thus we "reset" and compute a new [L, R] by comparing S[0...] to S[i...] and get Z[i] at the same time (Z[i] = R - L + 1).
  • Otherwise, i ≤ R, so the current [L, R] extends at least to i. Let k = i - L. We know that Z[i] ≥ min(Z[k], R - i + 1) becauseS[i...] matches S[k...] for at least R - i + 1 characters (they are in the [L, R] interval which we know to be a prefix-substring). Now we have a few more cases to consider.
  • If Z[k] < R - i + 1, then there is no longer prefix-substring starting at S[i] (or else Z[k] would be larger), meaning Z[i] = Z[k] and[L, R] stays the same. The latter is true because [L, R] only changes if there is a prefix-substring starting at S[i] that extends beyond R, which we know is not the case here.
  • If Z[k] ≥ R - i + 1, then it is possible for S[i...] to match S[0...] for more than R - i + 1 characters (i.e. past position R). Thus we need to update [L, R] by setting L = i and matching from S[R + 1] forward to obtain the new R. Again, we get Z[i] during this.
The process computes all of the Z values in a single pass over the string, so we're done. Correctness is inherent in the algorithm and is pretty intuitively clear.

Analysis

We claim that the algorithm runs in O(n) time, and the argument is straightforward. We never compare characters at positions less thanR, and every time we match a character R increases by one, so there are at most n comparisons there. Lastly, we can only mismatch once for each i (it causes R to stop increasing), so that's another at most n comparisons, giving O(n) total.

Code

Simple and short. Note that the optimization L = R = i is used when S[0] ≠ S[i] (it doesn't affect the algorithm since at the next iterationi > R regardless).

int L = 0, R = 0;
for (int i = 1; i < n; i++) {
  if (i > R) {
    L = R = i;
    while (R < n && s[R-L] == s[R]) R++;
    z[i] = R-L; R--;
  } else {
    int k = i-L;
    if (z[k] < R-i+1) z[i] = z[k];
    else {
      L = i;
      while (R < n && s[R-L] == s[R]) R++;
      z[i] = R-L; R--;
    }
  }
}

Application

One application of the Z Algorithm is for the standard string matching problem of finding matches for a pattern T of length m in a string Sof length n. We can do this in O(n + m) time by using the Z Algorithm on the string T Φ S (that is, concatenating TΦ, and S) where Φis a character that matches nothing. The indices i with Z[i] = m correspond to matches of T in S.

Lastly, to solve Problem B of Beta Round 93, we simply compute Z for the given string S, then iterate from i to n - 1. If Z[i] = n - i then we know the suffix from S[i] is a prefix, and if the largest Z value we've seen so far is at least n - i, then we know some string inside also matches that prefix. That gives the result.

int maxz = 0, res = 0;
for (int i = 1; i < n; i++) {
  if (z[i] == n-i && maxz >= n-i) { res = n-i; break; }
  maxz = max(maxz, z[i]);
}
http://codeforces.com/blog/entry/3107