Stuff I do: 2012

Monday, 31 December 2012

Adding GC logs to hadoop child processes and analyzing the GC logs

Add the following to the mapred-site.xml file:

mapred.child.java.opts : Java opts for the task tracker child processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc.

Additional options: -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC

Reference: http://hadoop.apache.org/docs/r1.0.0/mapred-default.html

Analyzing GC logs:
-------------------------

Meaning of the [GC [PSYoungGen: 230400K->19135K(268800K)] line is:

Around 256MB (268800K) is the Young Generation Size,
Before Garbage Collection in young generation the heap utilization in Young Generation area was around 255MB (230400K) and
After garbage collection it reduced up to 18MB (19135K).

Reference: http://middlewaremagic.com/weblogic/?p=5131
https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=22d56091-3a7b-4497-b36e-634b51838e11

Sunday, 30 December 2012

Playing around with Solr

The following article shows how to add a custom field (for ISBN no.s) to be indexed by Solr:

http://robotlibrarian.billdueber.com/solr-field-type-for-numericish-ids/

Would love to write a custom one myself soon !

Thursday, 27 December 2012

Inspiring article on Big Data

http://gigaom.com/data/how-disney-built-a-big-data-platform-on-a-startup-budget/

"By the sheer power of its will (and ingenuity), a small team has been able to craft a large custom platform out of Hadoop, NoSQL databases and other open-source technologies. "

Saturday, 22 December 2012

How to download the historical versions of pages from google cache

Well you can get the historical versions of web pages from the google cache. I am not sure how old are these, if any one of you knows and cares to look it up please let me know too :P ?

The query you need to use to get an older version of webpage www.mypage.com from Google Cache is as follows:
http://webcache.googleusercontent.com/search?q=cache:www.mypage.com

To download the page using the wget linux command, you need to pass user-agent as google doesn't allow wget to fetch data, so here is how you can automate it:

wget --output-document=out.htm --user-agent=AGENT --level=1 http://webcache.googleusercontent.com/search?q=cache:www.mypage.com

Interesting isn't it :)

Tuesday, 18 December 2012

Using awk to find the sum of a CSV file after group by on a particular column

http://www.theunixschool.com/2012/06/awk-10-examples-to-group-data-in-csv-or.html

I have a CSV file like :

65523 , 100
65522 , 900
65522 , 1800
65522 , 100
65522 , 100
65521 , 500
65521 , 200
65521 , 200

I need to find the sum of the 2nd column based on the grouping by the 1st column, so that the output looks something like:

65523 , 100
65522 , 2900
65521 , 900

SOLUTION:

This can be easily achieved using a single line awk script:

awk -F"," '{a[$1]+=$2;}END{for (i in a)print i, a[i];}' file

Awesome isn't it !! :)

Sorting a CSV file based on a particular column list

http://stackoverflow.com/questions/9471101/sort-csv-file-by-column-priority-using-the-sort-command-unix

sort --field-separator=';' --key=2,1

sort -nr -t',' -k3

To sort based on multiple columns use the syntax:

sort --key=1,1 --key=2,2r --key=3,3 --key=4,4r
sort -k1,1 -k2,2r -k3,3 -k4,4r

as in the following transcript:

pax$ echo '5 3 2 9
3 4 1 7
5 2 3 1
6 1 3 6
1 2 4 5
3 1 2 3
5 2 2 3' | sort --key=1,1 --key=2,2r --key=3,3 --key=4,4r

1 2 4 5
3 4 1 7
3 1 2 3
5 3 2 9
5 2 2 3
5 2 3 1
6 1 3 6

Remember to provide the -n option if you want them treated as proper numbers (variable length), such as:

sort -n -k1,1 -k2,2r -k3,3 -k4,4r

Learning to use Screen command in Unix

http://kb.iu.edu/data/acuy.html

If your local computer crashes, or you are connected via a modem and lose the connection, the processes or login sessions you establish through screen don't go away. You can resume your screen sessions with the following command: screen -r

screen ==> Start a new screen
(Ctrl+A) & C ==> Start a new screen sub-window
(Ctrl+A) & K ==> Kill the current sub-window

(Ctrl+A) & (Shift + ") ==> Show the list of screens running on the system

screen -r ==> restore to the old screens
screen -ls ==> list of running screens
(Ctrl+A) & (Shift + A) ==> rename the current screen

Sunday, 16 December 2012

Setting up SVN on AWS EC2 instance

Steps to setup the SVN repository on the cloud instance (http://www.ange-agostini.com/blog/it/5-minutes-to-set-up-a-subversion-server-in-the-cloud.html):

Install subversion, apache and mod_dav_svn:

# sudo yum install mod_dav_svn subversion

Edit the Apache configuration file for subversion:

# sudo vi /etc/httpd/conf.d/subversion.conf

Replace subversion.conf content by:


LoadModule dav_svn_module     modules/mod_dav_svn.so
LoadModule authz_svn_module   modules/mod_authz_svn.so
<Location /repos>
   DAV svn
   SVNParentPath /var/www/svn
   # Limit write permission to list of valid users.
   AuthType Basic
   AuthName "Authorization Realm"
   AuthUserFile /var/www/svn-auth/passwd
   AuthzSVNAccessFile  /var/www/svn-auth/access
   Require valid-user
</Location>

Create the directory which will contain the subversion repository:

# sudo mkdir /var/www/svn

Create the directory which will contain the permissions files.

 # sudo mkdir /var/www/svn-auth

Create the permission file:

# sudo vi /var/www/svn-auth/access
[/]
<theUser> = rw

Note: Replace <theUser> by the login you want to use to access your repository.
<theUser> will have read write access to all repositories.
It is possible to setup authorization by group or user for each repository.
Create the password file:

# sudo htpasswd -cb /var/www/svn-auth/passwd <theUser> <thePassword>

Note: Replace <theUser> by the login you want to use to access your repository. Replace <thePassword> by the password you want.
Create a repository (here project1):

# cd /var/www/svn
# sudo svnadmin create project1

Change files authorization:

# sudo chown -R apache.apache /var/www/svn /var/wwws/vn-auth
# chmod 600 /var/wwws/vn-auth/access /var/www/svn-auth/passwd

Start apache web server:

# sudo service httpd start

Note: to restart server use # sudo service httpd restart

Test subversion

Now subversion and apache should work.
Open a web browser and point to the URL : http://<Public DNS of your EC2 instance>/repos/project1
You should be prompted for your credential (Enter <theUser> <thePassword>) before accessing the repository

Subversion client

We are now going to interact with our repository from a windows PC.
If you don’t have a subversion client installed on your PC then you can install one from http://www.sliksvn.com/en/download .
You can test your subversion client from your PC by listing files on your repository:

svn ls http://<Public DNS of your EC2 instance>/repos/project1

The first time we often want to import some files to the repository:

svn import -m "Initial import." <path of the reposity where are the files on your PC> http://<Public DNS of your EC2 instance>/repos/project1

---------------------------------------------------------------------------------------------------------------------------------
ANOTHER WAY:

Steps given at https://forums.aws.amazon.com/thread.jspa?messageID=209468:

I signed up for an EC2 free tier tonight intending to use it as a Subversion server. The whole process took a couple of hours to setup, but can be compressed down to just a few minutes. If anybody else is looking to do the same, here is what I did:

1) Create the Linux-flavored micro instance.
2) Give it a Security Group that opens port 3690 to the sources of your choice. The following example allows SVN access from all Internet sources:

| tcp | 3690 | 3690 | 0.0.0.0/0
| udp | 3690 | 3690 | 0.0.0.0/0

3) SSH into the micro instance and run the command "sudo yum install subversion".
4) Create the directory to be used for subversion, for example "/home/ec2-user/svn".
5) Enter "svnadmin create /home/ec2-user/svn" (or whichever path you specified).
6) Edit two files: svnserve.conf and passwd both found in the "conf" directory of your newly created repository source.
In svnserve.conf, unremark the lines for "anon-access" (but change from "read" to "none") and "auth-access". Additionally, unremark the "password-db = passwd" line, but don't change it, and unremark the "realm =" line, providing a realm name of your choice.
In the passwd file, simply add a line entry with the user name(s) and password(s) to be used for access.
7) Finally, run "svnserve -d -r /home/ec2-user/svn" (or again, whichever path you specified) to kick off the instance.

Your Subversion server should now be available at the URL: svn://<yourinstancehostorip>/

Keep in mind the security for this quick and dirty setup is very minimal, and the daemon has not been configured for automatic startup at this point.

Reading:

http://jonathanhui.com/install-configure-subversion-ec2-amazon-linux

Saturday, 15 December 2012

Setting up Nutch on MAC - issues [Solved]

I setup nutch following the official tutorial : http://wiki.apache.org/nutch/NutchTutorial

However, I am facing this error when i try to run the command "bin/nutch crawl urls -dir crawl -depth 3 -topN 5" :

2012-12-15 14:43:13.028 java[82636:c07] *** Terminating app due to uncaught exception 'JavaNativeException', reason: 'KrbException: Could not load configuration from SCDynamicStore'
*** First throw call stack:
(
0 CoreFoundation 0x00007fff8b94cfc6 __exceptionPreprocess + 198
1 libobjc.A.dylib 0x00007fff87621d5e objc_exception_throw + 43
2 CoreFoundation 0x00007fff8b9d72a9 -[NSException raise] + 9
3 JavaNativeFoundation 0x0000000108aa1c3f JNFCallStaticVoidMethod + 213
4 libjava.jnilib 0x0000000108ac1169 Java_sun_security_krb5_SCDynamicStoreConfig_installNotificationCallback + 450
5 JavaNativeFoundation 0x0000000108aa4182 JNFPerformEnvBlock + 86
6 SystemConfiguration 0x00007fff8703d3b8 rlsPerform + 119
7 CoreFoundation 0x00007fff8b8bb6e1 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
8 CoreFoundation 0x00007fff8b8baf4d __CFRunLoopDoSources0 + 253
9 CoreFoundation 0x00007fff8b8e1d39 __CFRunLoopRun + 905
10 CoreFoundation 0x00007fff8b8e1676 CFRunLoopRunSpecific + 230
11 java 0x00000001081c0843 java + 18499
12 java 0x00000001081c029a java + 17050
13 java 0x00000001081bda98 java + 6808
)
terminate called throwing an exceptionAbort trap: 6

About SCDynamicStore : http://developer.apple.com/library/mac/#documentation/Networking/Reference/SCDynamicStore/Reference/reference.html

SOLUTION:

Put the following line in the bin/nutch file :

NUTCH_OPTS="$NUTCH_OPTS -Djava.security.krb5.realm= -Djava.security.krb5.kdc="

based on https://issues.apache.org/jira/browse/HADOOP-7489

Wednesday, 21 November 2012

GIT getting started

A very good tutorial on git : http://www.sbf5.com/~cduan/technical/git/

GIT commands:

Git Pull
From what I understand, git pull will pull down from a remote whatever you ask (so, whatever trunk you’re asking for) and instantly merge it into the branch you’re in when you make the request. Pull is a high-level request that runs ‘fetch’ then a ‘merge’ by default, or a rebase with ‘–rebase’. You could do without it, it’s just a convenience.
Git fetch
Fetch is similar to pull, except it won’t do any merging.
Git clone

Git clone will clone a repo int a newly created directory. It’s useful for when you’re setting up your local doodah

If you get stuck, run ‘git branch -a’ and it will show you exactly what’s going on with your branches. You can see which are remotes and which are local.

http://betterexplained.com/articles/aha-moments-when-learning-git/
http://jonas.nitro.dk/git/quick-reference.html

Reading: http://blog.mikepearce.net/2010/05/18/the-difference-between-git-pull-git-fetch-and-git-clone-and-git-rebase/

Friday, 9 November 2012

Finding your gateway IP address in Linux, windows and MAC

Original Source: http://wiki.amahi.org/index.php/Find_Your_Gateway_IP

Windows:
- Click Start > All Programs > Accessories > Command Prompt.
- When Command Prompt is open, type the following command: ipconfig | findstr /i "Gateway" (You can copy & paste it in the command prompt; just right-click anywhere in the command prompt window and select Paste.)
- You should see something like this:
  C:\Documents and Settings\administrator>ipconfig | findstr /i "Gateway" Default Gateway . . . . . . . . . : 192.168.1.1
- In this example, your default gateway (router) IP address is 192.168.1.1.

Linux:
- You'll need to open a Terminal. Depending on your Linux distribution, it can be located in the menu items at the top, or at the bottom of your screen. In this example, we will use Fedora. Click Applications > System Tools > Terminal.
- When terminal is open, type the following command: ip route | grep default
- The output of this should look something like the following:
  joe$ ip route | grep default default via 192.168.1.1 dev eth0 proto static
- In this example, again, 192.168.1.1 is your default gateway (router) IP address.

Mac OS X:
- Open the Terminal application. Do do this, click Finder > Applications > Utilities > Terminal.app.
- When Terminal.app is open, type the following command: netstat -nr | grep default
- This will output the following:
  joe$ netstat -nr | grep default default 192.168.1.1 UGSc 50 46 en1
- In this example, again, 192.168.1.1 is your default gateway (router) IP address.

Tuesday, 6 November 2012

Java Applications Monitoring and Instrumentation using JMX

Finally was successfully able to integrate the HTTP web interface for JMX caled JMiniX ( http://code.google.com/p/jminix/wiki/Documentation ).

To download the dependency jars:

Download the pom.xml of the project JMiniX
Run maven command "mvn compile pom.xml", this downloads the dependency jars to the .m2 maven folder in your user profile folder.
Copy these jars to your project's lib.

Important Note: In the web.xml you must specify the url pattern as "<url-pattern>/jmx/*</url-pattern>" otherwise JMiniX doesn't seem to work.

Reading: http://www.ibm.com/developerworks/java/library/j-jtp09196/index.html

http://www.mularien.com/blog/2007/11/09/5-minute-guide-to-spring-and-jmx/
http://today.java.net/pub/a/today/2005/12/20/introduction-to-jmxremote.html

Using Spring with vanilla Servlets without MVC

http://blogs.sourceallies.com/2012/02/injecting-spring-beans-into-java-servlets/

http://www.codeproject.com/Tips/251636/How-to-inject-Spring-beans-into-Servlets

http://andykayley.blogspot.in/2008/06/how-to-inject-spring-beans-into.html

http://static.springsource.org/spring/docs/3.0.0.M3/reference/html/ch04s11.html

Saturday, 3 November 2012

Debugging hadoop mapreduce jobs using eclipse in local setup

Add the following line to 'conf/hadoop-env.sh':

export HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5009"

Then setup eclipse to connect to the above port (5009) using remote debugging configuration.

Detailed steps: http://code.google.com/p/hadoop-clusternet/wiki/DebuggingJobsUsingEclipse

I tried running this is in the Stand alone configuration of Hadoop.
Have to try this in the pseudo-distributed mode as well, does it work ?

Here is an article that claims to debug the daemon processes:
http://srinathsview.blogspot.in/2012/05/debugging-hadoop-task-tracker-job.html

Routing protocols

A good link to read about routing, various routing protocols IGP, OSPF and BGP is Module 7 at the link: http://nptel.iitm.ac.in/courses/Webcourse-contents/IIT%20Kharagpur/Computer%20networks/New_index1.html

http://nptel.iitm.ac.in/courses/IIT-MADRAS/Computer_Networks/index.php

Learning Hadoop

Read the first 4 chapters of "Hadoop in Action" by Chuck Lam. These chapters introduce the basics of the MapReduce and Hadoop framework in easy language.

Run the word count example which comes with the hadoop installation.

http://hadoop.apache.org/docs/r0.20.1/quickstart.html
http://hadoop.apache.org/docs/r0.20.1/mapred_tutorial.html
https://confluence.guavus.com/display/GTR/Install+and+Setup+Hadoop+on+Mac

Exercise: Implement the median function in an efficient manner using Hadoop.

Saturday, 27 October 2012

String searching problem

Given a TEXT string say "abcmanmfaxyzabcn" and a search string "man". Find the number of occurrences of the SEARCH string "man" in the given text string. The alphabets of the search string can be separated by other characters in between them in the text string e.g. "man" occurs twice in "mabcayzn" with 'a' occurring at two different positions.

Solution:

First build a boolean array of 26 chars and store the chars which occur in the SEARCH string.
Next remove all characters from the TEXT string which don't occur in the SEARCH string using the above array. This allows us to reduce the problem size by removing unwanted characters from TEXT string.
The first approach i gave to the interviewer was to use a recursive function, which passes 2 arguments to the function findSubstring(String text, String search) and this searches search[0] in all the places in text and then calls the function on the remaining subproblem. However the time complexity of this solution is exponential.
Next i came up with a dynamic programming solution. In this we create a matrix res[m][n] with TEXT string chars as the columns and the SEARCH string chars as the rows. Now we calculate res[i][j] as:

res[i][j] = 0 if SEARCH[i] != TEXT[j]

Sum of all res[i-1][k] for all 0<k<j

The time complexity of this solution is O(mn) and is a lot faster than the recursive solution.

Design a timer class

Design a timer class. The users of your class should be able to register to your class with the time in secs and their callbacks. These callbacks are called as the time expires.

The timer runs in its own thread, and it sleeps for 1 sec (since, the interviewer had asked for 1 second as the granularity)
It maintains a list of 'Task/Runnable' along with the remaining time to expire.
Every time the timer wakes up after a sec, it reduces the expiry time for each task by 1 sec.
And for all those tasks for which the expiry reaches zero, their callbacks are called.
I suggested a faster way (where one doesn't need to iterate the list of tasks) if the timer was limited to a fixed number of seconds using an circular array of seconds.

http://www.java2s.com/Code/Java/Development-Class/AsimpleimplementationoftheJava13javautilTimerAPI.htm

http://www.artima.com/intv/blochP.html

Linked list Duplication

Given a linked list which has the node structure as shown below:

class Node{
Node val; // note that the value also randomly points to some other node in the LL
Node next;
}

How would you duplicate this linked list ?

Hint: I used an external hashmap DS in my solution.

Better solution:
http://www.geeksforgeeks.org/a-linked-list-with-next-and-arbit-pointer/

Microsoft Interview Question

Given two numbers as a linked list of digits, write a program to add the numbers and return the result in the new linked list.

Microsoft Interview Questions

Make the mirror image of the given binary tree.
Design the data structure and algorithm to help in spell correction of words. The correct words are given in alphabetical order in the dictionary file.
Write a function which tests whether the given binary tree is a binary search tree or not?
Merge two given BSTs in linear time.
Using getchar() take user input until user presses Enter key. And then print the user input in the reverse order.

Adobe Interview Questions

A rectangle is cut from another rectangle (any orientation). Find the line which divides both rectangles into 2 equal halves.
Do the tree traversal level by level. After each level print the new line i.e. ‘\n’.
Solve the 12 coin problem to find the different coin in 3 weightings.
A king has 1000 barrels of wine, out of which one is poisoned. Use min. no of prisoners to identify the poisoned bottle.
Modified tower of Hanoi. Every disk move is through an intermediate bar.
Find the no. of ways in which the sum of coins can be obtained using the coins x,y,z.
Sorting of very large file, which does not fit in memory.

Google Interview Questions

You are given a sorted array (a[]) and a value sum. Find the pairs a[i] and a[j] in the sorted array s.t. a[i] + a[j] = sum. Your algorithm should be linear in time and space.
In the tennis tournament find the second best player.

Monday, 8 October 2012

Find all contiguous sub-arrays whose sum is zero.

For example:

i/p: 1,2,3,-3,-2,-1
o/p: [2,3], [1,4], [0,5]

You can do a running sum of all the array elements to get the tmp[]:
=> 1 3 6 3 1 0

Now in this array you need to find all duplicates,triplets etc.. All such pairs represent indices of numbers between which the sum of sub-array is zero.

Now to find the duplicates, we can store the tmp[] values as the key in the hashmap with the array index as the values. So, all indices which fall to same key gives us the solution.

Design data structure for merging meeting schedules

http://stackoverflow.com/questions/8346689/design-data-structure-for-merging-meeting-schedules

Add two numbers without using arithmetic operators

http://www.geeksforgeeks.org/archives/18324

Store an N-ary tree structure in Relational database table

I was asked this question during my tower research interview for the position of Senior script developer.
I came up with the solution to store it in a self referring table which has two columns (parent,child). So, in this kind of table he said that finding the depth of any particular node, i would have to write/call queries equaling the depth of the node under consideration. He asked how can you improve the data structure to store this kind of structure in tables ??

http://iamcam.wordpress.com/2006/03/24/storing-hierarchical-data-in-a-database-part-2a-modified-preorder-tree-traversal/
http://www.sitepoint.com/hierarchical-data-database-2/

Other questions:

Linux:

Find all files open under a process -> ls -l /proc/{PID}/fd

How to setup a password-less SSH

Remove the soft-links in a directory and replace them with the copy of the original file.

List all files in a directory recursively which are not owned by a particular user.

Array Based:

Given 2K+1 numbers out of which K numbers occur twice each and one number occurs just once, find this one time occurring number. -> used XOR

Given numbers from 1 to K with 2 numbers missing, find the 2 missing numbers. -> used XOR based partitions.

Java:

When is finally called. What if you return from inside try and have a return inside finally as well ? What if finally throws an exception ?

About the synchronized() keyword.

Database:

How would you store a tree in the relational tables?

Interviewer: Manikant K.

How to design a web forum

This was asked to me as a design question in amazon telephonic round. The interviewer wanted to know the DB design and the web-app design for a forum which supports around 100K users with the following use-cases:

-Sign up
- Posting
- Replying
- Searching [keywords + dates based]

I told him we could have following DB schemas:

Thread [tid, subject, date]
Post [pid, tid, uid, msg, datetime ]
User [uid, role, emailid, password_hash ...]
Role [rid, list flags(for permisssion etc.)...]

The plain DB solves the use cases for Sign-up, posting and replying but could become a bottle neck for search if we run a search query based on query in the Post table (with a where clause as 'like "%keyword%"). So, for improving the performance of this, i suggested running an independent background process which parses the msgs in the post table and stores the important keywords from this messsage in the independent datastore/B-Tree structure to be able to index the keywords for faster searches. The corresponding value for the keywords is the 'pid' (post id) in the table. This should improve the performance of the query. the interviewer seemed to have liked the proposal. Let me know if you guys can think of a better strategy for faster searches.

After this he wanted me to move to the Java side and explain the design there. I told that we would make similar classes/objects like the tables above, define relationship between those classes and then define the operations on these classes.
For components, I told him i would use the MVC architecture, where the views would be JSPs and Controller would be the interacting servlets which would extract data from the views, call the business logic from the Model and then fill the results in a new view which is returned to the user. He asked me to delve a little deeper into the Model components. I told this would basically have two main layers one is DAO (which interacts with the DB) and other is the service layer (which is called by the controller and internally calls the DAO classes for DB interaction). The service classes would be using the above set of objects.
PS: I wasn't very comfortable with the component design part, so any comments are welcome.

Sunday, 7 October 2012

Chat server based on Apache Mina

Initially thought of making a chat server based on tomcat, but came across apache mina and apache async-web products which seem to be exciting and more scalable as well.

http://mina.apache.org/

Configuring the maximum no. of request threads for the tomcat server

Reference: http://tomcat-configure.blogspot.in/2009/01/tomcat-maxthreads-configuration.html
Reading: http://tomcat.apache.org/tomcat-5.5-doc/config/http.html
http://tomcat.apache.org/tomcat-6.0-doc/aio.html

Tomcat maxThreads configuration

Tomcat maxThreads represents the maximum number of request
processing threads to be created by the HTTPConnector.

<Connector port="8080" address="localhost"     
     maxThreads="250" maxHttpHeaderSize="8192"
     emptySessionPath="true" protocol="HTTP/1.1"
     enableLookups="false" redirectPort="8443" acceptCount="100"
     connectionTimeout="20000" disableUploadTimeout="true" />

This determines the maximum number of simultaneous requests that can be handled. If not specified, this attribute is set to the default value of 200.

How the process works:

At server startup, the HTTP Connector will create a number of processing threads based on the value configured for theminSpareThreads attribute.
Each incoming request requires a thread for the duration of that request.
If the number of simultaneous requests cannot be handled by the currently available request processing threads, additional threads will be created up to the configured maximum (the value of the maxThreads attribute).
If still more simultaneous requests are received, they are stacked up up to the configured maximum (the value of the acceptCount attribute).
Any further simultaneous requests will receive "connection refused" errors, until resources are available to process them.

Guidelines for maxThreads:
maxThreads is an important tuning parameter, however if you are reaching an error like:

org.apache.tomcat.util.threads.ThreadPool logFull SEVERE: All threads (150) are
currently busy, waiting. Increase maxThreads (150) or check the servlet status

you should at first investigate if it's rather a problem of individual requests taking too long: are your threads returning to the pool? if, for example, database connections are not released, threads pile up waiting to obtain a database connection thereby making it impossible to process additional requests. This is a problem in your webapp.
Take a thread dump to find out where they're stuck. Increasing too much maxThreads will lead to :

Consume a good chunk of memory.
Your system will spend too much time context switching

So once you have already optimized your application try raising you maxThread attribute up to 500-750. I wouldn't advice to create larger Connectors, rather if 750 Connections are not enough create a Cluster configuration with several Tomcat instances. For example 2 instances of tomcat each one with maxThreads=500 instead of a single Tomcat with maxThreads=1000
If you want to learn more about JDK tuning read this tutorial which contains many tips valid also for tomcat:
http://www.mastertheboss.com/en/jboss-application-server/113-jboss-performance-tuning-1.html

Wednesday, 29 August 2012

Next Greater Element

Given an array, print the Next Greater Element (NGE) for every element. The Next greater Element for an element x is the first greater element on the right side of x in array. Elements for which no greater element exist, consider next greater element as -1.

Amazing solution in the link below, based on a stack for O(n) time. I just loved it :P

Reading:http://www.geeksforgeeks.org/archives/8405

Tuesday, 28 August 2012

How to find the longest palindrome in a string in linear time

Solution:

Probably finding the LCS (longest common sub-sequence) between the string and its reverse.

anand tech blog ??

Microsoft Interview- Two elements of BST are swapped by mistake. You have to restore the tree without changing its structure.

Two elements of BST are swapped by mistake. You have to restore the tree without changing its structure.

Solution:

// incomplete solution

Node fixSwappedBST ( Node r, int min, int max, Node toFix){

// r - root node, r.val >= min && r.val <= max and toFix is the node which needs to be fixed and was at a wrong place in the left subtree of the common ancestor (of the swapped nodes)

if(r==null) return null;
if(r.val<min || r.val > max){ // wrong node, so fix it
if(toFix != null) { // swap the nodes
int tmp = toFix.val; toFix.val=r.val; r.val=tmp;
}
else{ toFix = r; }
fixSwappedBST(r.left, min..
}
else{
fixSwappedBST(r.left, min, r.val,toFix);
fixSwappedBST(r.right,r.val, max,toFix);
}
}

Another approach:

Do a Pre-order traversal of the BST. It would be wrong at two places, identify those two positions in the Pre-order traversal. Identify the first element of the pair where it is out of order and the last element of the pair where it is again out of order.

In the next traversal, swap these two nodes. Would this work ??
Solution: http://www.geeksforgeeks.org/archives/23616
Reference: http://www.geeksforgeeks.org/archives/23088

Given a linked list containing 0s,1s or 2s. Sort it.

Given a linked list containing 0s,1s or 2s. Sort it.

Solution:
Maintain three different linked lists each having its own start pointer (LL0, LL1, LL2) and keep adding the elements to the respective lists.
After the traversal you have three linkedlists (LL0, LL1 and LL2), now just join LL1 to the end of LL0 and LL2 to the end of LL1.

Another approach:
Keep three counters (c0,c1 and c2) and in the end reset the linked list elements as per the counters.

http://www.geeksforgeeks.org/archives/23088

Friday, 24 August 2012

Recursive function to check a string for palindrome

boolean isPalindrome(String s){

if(s.length()<=1) return true;

if(s.charAt(0) == s.charAt(s.length()-1)) return isPalindrome(s.substring(1,s.length()-2));
else return false;

}

Thursday, 23 August 2012

Handling concurrency in a Web application

Reading: http://stackoverflow.com/questions/7455726/handling-the-concurrent-request-while-persisting-in-oracle-database

To handle concurrency in a web site a common practice it to have a column on each record that allows you to check it has not been updated since you got it. Either last update date or a sequential version number (auto incremented by a trigger).

Typically you will read the data (plus the concurrency column)

SELECT seat,etc,version_no
FROM t1
WHERE column = a_value

Then when the user eventually gets round to booking the seat the update will work unless there has been an update.

(the version number or update date will change after every update)

BEGIN
    UPDATE t1
    SET seatTaken = true
    WHERE seatid = .....
    AND version_no = p_version
    RETURNING version_no INTO p_version;
EXCEPTION WHEN NOT_FOUND THEN
    --Generate a custom exception 
    --concurrency viloation the record has been updated alreadyEND;

the trigger to auto update the version number would look a little like this

CREATE OR REPLACE TRIGGER t1_version
AFTER INSERT OR UPDATE ON t1
FOR EACH ROWBEGIN
    IF :new.version_no IS NULL THEN
       :new.version_no  := 0;
    ELSE
       :new.version_no  := :old.version_no  + 1;
    END IF;
END;

Saturday, 18 August 2012

How to design a good API

From a talk by Joshua Bloch: http://www.youtube.com/watch?v=aAb7hSCtvGw
Slides: http://lcsd05.cs.tamu.edu/slides/keynote.pdf

API design is very similar to Language and language feature design.
Easy to evolve, hard to misuse
Gather requirements with healthy degree of skepticism. Define use cases from the requirements. Often it is easier and more rewarding to solve a more general form of problem than solving a specific problem so keep your mind open.
Keep the initial spec of the API short/1 page. And discuss with as many people and then flesh it out as confidence builds.
Write to the API (means write clients using API interface) as you design the API, this is very important as it allows you to explore the use cases API is being designed to solve.
API should do ONE thing good.

Reading: http://www.artima.com/intv/blochP.html

Sunday, 12 August 2012

State Design Pattern

Read the chapter on State design patterns from Head First Design Patterns book.

They explain the GumBallMachine, which can be in different states and needs to perform each action in a different way depending on the current state.

To start with they declare the action() methods in the GumBallMachine class only and in each action method they use if-else ladder to check the current state and take action based on the current state. And show how this type of state machine coding makes the action methods cluttered.

And in the next approach they make a State interface which has list of common action methods. And then there are concrete state classes for each state, which implement this interface and perform the actions() depending on the current state(i.e. the class for the state). This allows you to store the current_state object in the GumBallMachine class and call the state.action() methods inside this class. Depending on the current state the apt methods are called for each action(). This decouples each state from other states and makes adding new states easier.

Saturday, 11 August 2012

Find the longest increasing subsequence

Its a DP problem:

Let L[j] be the length of the longest increasing sub sequence including jth array element.
=> L[i] = MAX(L[j]+1) for all (j<i)&&(a[j]<=a[i]).

O(n^2) solution.

http://www.geeksforgeeks.org/archives/12832

Friday, 3 August 2012

Maximum size square sub-matrix with all 1s

http://www.geeksforgeeks.org/archives/6257

Maximum Product Subarray

Given an array that contains both positive and negative integers, find the product of the maximum product subarray. Expected Time complexity is O(n) and only O(1) extra space can be used.

Solution:
I. The solution that came to my mind first is to "Use a way similar to Kadane's algorithm for finding the maximum sum of a sub array.", but in this case the when (prd<0) we can't just discard it like we do it fro sum, bcoz a future -ve number might make this the highest product :P.
We would have to keep an integer storing the last_neg_num_prod to store the last -ve product upto a point and then set (prd=1).

public static void main(String[] args) {
int a[] = {-2, -3, 0, -2, -40};
int max= Integer.MIN_VALUE, last_neg_prod=0, prd= 1;
for(int i=0;i<a.length;i++){
prd = prd * a[i];
out:
if(prd<0){
if(last_neg_prod<0){ // we have an old -ve product, we can combine these to get a higher +ve product then
prd = prd*last_neg_prod;
last_neg_prod = 0;
break out;
}
last_neg_prod = prd;
prd = 1;
}
else if(prd ==0){
last_neg_prod = 0;
prd=1;
}
if(max<prd){
max = prd;
}
}

System.out.println("max prd is:"+max);

}

source: http://www.geeksforgeeks.org/archives/22216