Saturday, 24 March 2018

Given two sorted linked lists. Find the path of maximum sum.

Given two sorted linked lists. You start with a one of the two lists and then move till the end. You may switch to the other list only at the point of intersection (which mean the two node with the same value in different lists.) You have to find the path of maximum sum.

Eg
  1->3->30->90->120->240->511
  0->3->12->32->90->125->240->249
  You can switch at 3 90 or 240 so the max sum paths is
  1->3->12->32->90->125->240->511 
Sol:

This can be solved in O(m+n).

Take two pointers p1 and p2 for both lists. 
Maintain two sums curr1 and curr2 for each list init to 0.
  1. If p1 == p2: 
    • if curr1 > curr2: Choose LL1 as path upto this point.
    • else: Choose LL2 as path.
  2. If p1 < p2:
    • curr1 += p1
    • increment p1 to next node.
  3. else if p2 < p1:
    • curr2 += p2
    • increment p2 to next node
  4. if p1 == null:
    • traverse all of p2 and keep incrementing curr2.
    • Take the path with greater sum
  5. if p2 == null: // do as above for p1

Find all the Nodes at the distance K from a given node in a Binary tree. Print them in any order.

Sol:

This can be divided into two cases:

  1. It is easy to print the nodes at distance K from node which are under the subtree of the given node. Pass an integer 'dist' to its children and increment it each time. When 'dist' == K, print those nodes.
  2. Nodes at distance K from the given node with its parent node in the path. For this we can return 'dist' in our recursive function. And increment it at each step before returning. When returned value at any point is K we print those nodes. Special case: If we are doing inorder traversal, we need to handle a special case when the given node is the right child of its parent. In this case we will have to traverse the left child of its parent again.

Thursday, 27 October 2016

Tomcat Performance

Tomcat's basic status can be accessed via the url: http://localhost:8080/manager/status url. It shows the threads and the connections stats.

To see detailed tomcat statistics enable JMX beans for tomcat by adding following to setnev.sh - "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=15555"

Then the MBeans for tomcat and your specific webapp/Servlet can be accessed via jconsole or JMXProxyServlet.
https://tomcat.apache.org/tomcat-7.0-doc/manager-howto.html#Using_the_JMX_Proxy_Servlet

You need to add following entries to conf/tomcat-users.xml to be able to access manager/status and manager/jmxproxy urls.
<role rolename="manager-gui"/>
<role rolename="manager-status"/>
<role rolename="manager-script"/>
<role rolename="manager-jmx"/>

 <role rolename="tomcat"/>
 <role rolename="role1"/>
 <user username="tomcat" password="tomcat" roles="tomcat,manager-gui,manager-status,manager-jmx"/>
 <user username="both" password="tomcat" roles="tomcat,role1"/>
 <user username="role1" password="tomcat" roles="role1"/>

Source: https://www.mulesoft.com/tcat/tomcat-performance

Tuning Tomcat Performance For Optimum Speed

Tomcat configuration is by default geared towards first time users looking for powerful, hassle-free, out-of-the-box functionality. However, when deployed in a real-world setting, where high server load can be expected and achieving the best possible peak load performance is vital, it is important to customize these default settings based on your site's needs.
This may seem like a daunting task, and it is certainly not a small one, but the good news is that Tomcat is a very configurable system, and by following a logical process of testing and implementation, you can vastly improve your site's performance.
Are you tuning Tomcat for performance? Tcat gives you deep visibility into the performance of your web applications, and allows you to save optimal configurations. Get new instances up and running with a single click. Try Tcat today!

Understanding Limits

While some of Tomcat's settings can be adjusted in a way that will increase any site's performance, the most beneficial techniques for increasing Tomcat's performance are need-specific - that is, they are geared towards making Tomcat perform in the best way for your site, based on factors such as the types of traffic you serve and the web applications your server runs.
Before getting down to business, let's take a look at the biggest limits on your site's performance. Three factors negatively impact Tomcat's performance more than any others - network delays, remote client latency, and application architecture.
Two of these factors - network delays and remote client latency - are for the most part unavoidable, although using a web server in front of Tomcat to serve static content will free up more power for dynamic content. Remote client latency can be mitigated by compressing content with Apache mod_zip or Tomcat's compression filter.
However, the largest increases in performance are directly related to the architecture of your web applications. Consequently, the first step towards streamlining Tomcat's performance is determining the needs of your site, and using benchmark tools to obtain a clear, comprehensive picture of your site's current performance.

Establishing a Benchmark

Before any improvements can be made, it is essential to establish an accurate benchmark of your site's current performance, so that any changes can be measured in a useful, quantifiable manner.
There are a wide variety of both commercial and free tools that you can use to measure Tomcat's performance. Two of the most effective, Apache Jakarta JMeter, and ab, an HTTP server benchmark tool provided by Apache, can be obtained for free on Apache's website, along with further documentation and more specialized benchmark tools.
To generate a high-quality benchmark, it is important to simulate the real world scenario in which your site operates as closely as possible. Your test system should be as close a match to your production system as possible. Use the same hardware, OS, and software, and populate databases with the expected real-world record load. Generate traffic that is characteristic of your real-world traffic - if you are expecting lots of human traffic, don't test your system with repetitive HTTP requests.
When running your benchmarks, make sure to include tests for any special situations such as short term traffic spikes or searches that return extremely large result sets. Your final tests should be run over a two to three day period, as factors such as variable JVM performance and Tomcat memory leaks may only expose themselves after a longer time period.

General Tuning Techniques

Now that you've established your benchmark, it's time to apply a few techniques that provide a general increase in performance for almost all Tomcat users. This will allow you to determine how much your application's architecture needs to be modified.

Tuning JVM

The proper Tomcat JVM configuration is essential for getting the most out of your server.
Before you start changing any settings, you should make sure that you have chosen the most logical JVM for your site's needs. There are a growing number of JVM vendors, and if you do not require any JDK-specific functionality, it is a good idea to run some benchmarks and see which solution gives you the best performance. Also, make sure to upgrade to the latest stable release of your JVM, as this may give you a sizable performance boost right away.
Next, consider experimenting with your JVM threading configuration. If your JVM supports both green and native threads, you should try both models to determine the best choice for your site. If you are running I/O bound applications, the native thread model should offer you improved performance. However, green threads will decrease the load placed on your machine. If you are unsure which option to choose, native threads are usually a good choice.
Certain JVM processes, such as garbage collection and memory reallocation, can be a drain on your server. You can reduce the frequency with which these processes occur by using the -Xmx and -Xms switches to control how JVM handles its heap memory.
JVM garbage collection can use up valuable CPU power that you want being used to serve web requests. To reduce the frequency with which JVM invokes garbage collection, Use the -Xmx switch to start the JVM with a higher maximum heap memory. This will free up CPU time for the processes you really care about.
To maximize the effectiveness of this technique, use the -Xms switch to ensure that the JVM's initial heap memory size is equal to the maximum allocated memory. This will keep the Tomcat JVM from having to reallocate and resize its heap memory, which will free up additional CPU cycles for Tomcat to serve requests. If your web applications can handle the possibility of lower total garbage collection throughput, you should try enabling incremental collection with -Xincgc.
If you need more information on the way your current configuration is handling your collection load, use -verbose:gc to capture performance data.

Configuring Connectors

Basing your maxThreads Connecter thread pool settings on an accurate estimation of your web request load is essential for getting the most out of Tomcat.
Values that are too small can leave you without enough threads to handle all your requests, and prevent Tomcat from effectively utilizing your server hardware to increase performance. Values that are too high significantly increase Tomcat's startup time, which is a critical issue at at peak traffic intervals. Experiment with different values to determine the best middle ground, and you should see an increase in performance.

Compression

By default, the compression attribute is set to off, but some applications perform better when it is switched to on. Try changing your settings and see what works best for your site. If you find that turning compression on increases your performance, make sure you use the compressableMimeTypes setting to specify what types of data you want compressed.

HTTP, HTTPS, and HTTPD

In general, using HTTP instead of HTTPS will result in much better Tomcat performance. However, HTTP may not be right for your site. If you require the security of HTTPS, despite its slow speed compared to HTTP, you may have to consider adding additional servers closer to your users to increase speed. The problem lies in the verbose traffic HTTPS generates during requests, which increases the overall serve time for users with higher pings.
Whatever you do, using Apache HTTPD to proxy your requests should be avoided at all costs, as it will decrease your performance by nearly 50%.

Web Servers For Static Content

Tomcat's major strength is dynamic content generation, and it will balance loads better if it is not responsible for anything else. Dedicating a web server in front of Tomcat to serve any static content your site requires is a quick way to free up more power to serve requests.

Stay Current

Every major Tomcat release is optimized for higher performance across the board. Taking the time to keep your version updated will save you big performance headaches in the future.

Tuning Your Applications For Performance

Now that you've done everything you can to customize Tomcat's configuration to match your needs, it's time to look at improvements to your application's architecture. This process is more complex, but the performance gains you will see are exponentially greater.
The best time to think about optimizing your application for better performance with Tomcat is during the development phase. Using tools such as JProbe or OptimizeIt to search for any bottlenecks due to thread synchronization will save you a lot of hassle when you deploy your application.
Additionally, read up on optimization techniques specific to each of your application's elements. If your database performance is slower than you'd like, consider using middleware to persist and cache objects. You'll see less thrashing of the JVM for creation, which means less garbage collection, and less db query latency. Consider factors such as the suitability of your protocol for the project. For example, if you are loading large amounts of data, and you need high performance, XML is the wrong choice.
Design your application to distribute work across the tools that do that work most efficiently. If you are serving dynamic pages that are in reality more static than dynamic, consider changing them to static pages and serving them with your web server to take some unnecessary load off of Tomcat.
If the page changes too frequently to make this option logical, try temporarily caching the dynamic content, so that it doesn't need to be regenerated over and over again. Any techniques you can use to cache work that's already been done instead of doing it again should be used - this is the key to achieving the best Tomcat performance. The most significant limits to performance are the demands of your web application, so consider all your design choices carefully.
Implement these techniques, and you're well on your way to great Tomcat performance.

Monday, 3 October 2016

Given 4 lists of numbers (including -ve no.s) of length n, find all combinations of numbers (a, b, c, d) from respective lists which sum to zero.

We need to find 4 numbers, one from each list which sum up to zero.

Solving for 2 lists:
We can solve for 2 lists in O(n) as follows,

  • Put the first list in a hashmap.
  • Iterate the second list
    • for each element s(i) in second list, lookup "0-s(i)" in the map
    • If found we got one pair

Solving for 4 lists:
  • Combine first and second list as follows,
    • For each combination of f(i) and s(j), create an element f(i)+s(j)
    • This gives us n^2 elements
  • Combine 3rd and 4th lists in a similar manner and get another n^2 list.
  • Now we can solve for these two lists of size n^2 in O(n^2).

Wednesday, 20 July 2016

Java like thread dumps for the Python process

import threading, sys, traceback

def dumpstacks(signal, frame):
    id2name = dict([(th.ident, th.name) for th in threading.enumerate()])
    code = []
    for threadId, stack in sys._current_frames().items():
        code.append("\n# Thread: %s(%d)" % (id2name.get(threadId,""), threadId))
        for filename, lineno, name, line in traceback.extract_stack(stack):
            code.append('File: "%s", line %d, in %s' % (filename, lineno, name))
            if line:
                code.append("  %s" % (line.strip()))
    print "\n".join(code)

import signal
signal.signal(signal.SIGQUIT, dumpstacks)

Saturday, 28 May 2016

List the methods/operations available on an object

Use the dir(object) command to lis the various operations available on the object.

eg.
>>> dir(stringObj)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Saturday, 7 May 2016

SSH - Black Magic


Remote port forwarding - Public connections to my laptop on any network:
------------------------------------------------------------------------
ssh -R ${remote}:${local} remote

Initiated by the remote machine.
Example:
1.) Run this on your Mac
ssh -R *:2020:localhost:22  -i awskey.pem ec2-user@ec2-xx-xx-xxx-xxx.us-west-2.compute.amazonaws.com

2.) Then on AWS machine to access your Mac:
ssh macuser@localhost -p 2020


Local port forwarding:
-------------------------------------------------
ssh -L ${local}:${remote} remote

Initiated by the local machine.

3.) Dynamic port forwarding, SOCKS 5 proxy using '-D' flag


Original link:
https://vimeo.com/54505525

Thursday, 31 March 2016

Finding the exact binary library containing the symbols in a directory of binaries

Sometimes while compiling a program you run into an issue of missing symbols; e.g.

Undefined symbols for architecture x86_64:
  "cv::MSER::create(int, int, int, double, double, int, double, double, int)", referenced from:
      getBlackAndWhiteImage(cv::Mat, int, double, double, double, std::__1::vector<std::__1::vector<cv::Point_<int>, std::__1::allocator<cv::Point_<int> > >, std::__1::allocator<std::__1::vector<cv::Point_<int>, std::__1::allocator<cv::Point_<int> > > > >&, std::__1::vector<cv::Rect_<int>, std::__1::allocator<cv::Rect_<int> > >&) in showimg-53afc0.o
ld: symbol(s) not found for architecture x86_64


Now you need to find the exact binary file in lets say a directory, this command can help you with it:

find /usr/local/lib/libopencv_* | awk '{ print "echo "$1"; nm "$1" | grep -i MSER" }' | sh

Saturday, 19 September 2015

Find maximum value of Sum( i*arr[i]) with only rotations on given array allowed

Given an array, only rotation operation is allowed on array. We can rotate the array as many times as we want. Return the maximum possbile of summation of i*arr[i].

Example:
Input: arr[] = {1, 20, 2, 10}
Output: 72
We can 72 by rotating array twice.
{2, 10, 1, 20}
20*3 + 1*2 + 10*1 + 2*0 = 72

Input: arr[] = {10, 1, 2, 3, 4, 5, 6, 7, 8, 9};
Output: 330
We can 330 by rotating array 9 times.
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
0*1 + 1*2 + 2*3 ... 9*10 = 330

Solution:

Maintain the sum (lets say, B) of the full array except the last and first element for the current rotation.
After each clockwise rotation, the new sum to calculated(lets say, A) increases by this B & the current first element, and reduces by the (n-1) * last element.

Wednesday, 14 January 2015

Checking symbols/functions in *nix binary library so/a files

What the man page says - "Nm  displays  the  name  list (symbol table) of each object file in the argument list."

If you want to list the symbols or function names in the binary library you can see their names using this command.

Example:
nm  opt/hadoop-2.4.0/lib/native/libhadoop.so

If you have to check whether there is support for snappy in your hadoop binary do this:
nm  opt/hadoop-2.4.0/lib/native/libhadoop.so | grep -i snappy

And you should see something like this:
Java_org_apache_hadoop_io_compress_snappy_SnappyCompressor_compressBytesDirect0000000000003960 T Java_org_apache_hadoop_io_compress_snappy_SnappyCompressor_initIDs0000000000003bb0 T Java_org_apache_hadoop_io_compress_snappy_SnappyDecompressor_decompressBytesDirect0000000000003f60 T Java_org_apache_hadoop_io_compress_snappy_SnappyDecompressor_initIDs0000000000206cf0 b dlsym_snappy_compress0000000000206d20 b dlsym_snappy_uncompress

Without these Java native methods being compiled and available in the libhadoop.so, MR runtime will also complain that "native snappy library not available".

Thursday, 6 November 2014

Using Memory Mapped files in Java

When two processes map the same file in memory, the memory that one process writes is seen by another process, so memory mapped files can be used as an interprocess communication mechanism. We can say that memory-mapped files offer the same interprocess communication services as shared memory with the addition of filesystem persistence. However, as the operating system has to synchronize the file contents with the memory contents, memory-mapped files are not as fast as shared memory.


/*--------------------------------------------------------------
Class MemoryMapWriter - creates mmap file
---------------------------------------------------------------*/

import java.io.File;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

public class MemoryMapWriter {

public static void main(String[] args) throws Exception {
File f = new File("/tmp/mapped.txt");
f.delete();

FileChannel fc = new RandomAccessFile(f, "rw").getChannel();

int start = 0;
long counter = 1;
long HUNDREDK = 100000;
long startT = System.currentTimeMillis();
long noOfMessage = HUNDREDK * 100 * 10;

long bufferSize = 8 * noOfMessage;
MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_WRITE, 0,
bufferSize);

for (;;) {
if (!mem.hasRemaining()) {
start += mem.position();
mem = fc.map(FileChannel.MapMode.READ_WRITE, start, bufferSize);
}
mem.putLong(counter);
counter++;
if (counter > noOfMessage)
break;
}
long endT = System.currentTimeMillis();
long tot = endT - startT;
System.out.println(String.format("No Of Message %s , Time(ms) %s ",
noOfMessage, tot));

fc.close();
unmap(fc, mem);
}

private static void unmap(FileChannel fc, MappedByteBuffer bb)
throws Exception {
Class<?> fcClass = fc.getClass();
java.lang.reflect.Method unmapMethod = fcClass.getDeclaredMethod(
"unmap", new Class[] { java.nio.MappedByteBuffer.class });
unmapMethod.setAccessible(true);
unmapMethod.invoke(null, new Object[] { bb });
}

}

/*--------------------------------------------------------------
Class MemoryMapReader - reads mmap file
---------------------------------------------------------------*/
package org.g;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

public class MemoryMapReader {

public static void main(String[] args) throws FileNotFoundException,
IOException, InterruptedException {

FileChannel fc = new RandomAccessFile(new File("/tmp/mapped.txt"), "rw")
.getChannel();

// long bufferSize=8*10000;
MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0,
fc.size());
long oldSize = fc.size();

long currentPos = 0;
long xx = currentPos;

long startTime = System.currentTimeMillis();
long lastValue = -1;

int exporterId = 8;

mem.position(exporterId * 8);
long res = mem.getLong() - 1;
if (res != exporterId) {
System.out.println("this should not happen !! ");
}

System.out.println("Resetting pos ...");
// reset position again
mem.position(0);

for (;;) {

while (mem.hasRemaining()) {
lastValue = mem.getLong();
currentPos += 8;
}
if (currentPos < oldSize) {

xx = xx + mem.position();
System.out.println("this shuold never occur !!! ");
// mem = fc.map(FileChannel.MapMode.READ_ONLY,xx, bufferSize);
continue;
} else {
long end = System.currentTimeMillis();
long tot = end - startTime;
System.out.println(String.format(
"Last Value Read %s , Time(ms) %s ", lastValue, tot));
break;
}

}

}

}

More Reading:
--------------------

To unmap the memory mapped byte buffer - http://lotusandjava.blogspot.in/2012/02/how-to-unmap-mappedbytebuffer.html
http://stackoverflow.com/questions/8462200/examples-of-forcing-freeing-of-native-memory-direct-bytebuffer-has-allocated-us


Measuring system time with micro second precision in c++

#include <sys/time.h>
#include <stdio.h>
#include <unistd.h>

struct timeval start, end;
long mtime, secs, usecs;

int main(int argc, char * argv[]){

gettimeofday(&start, NULL);

sleep(10);

gettimeofday(&end, NULL);
secs  = end.tv_sec  - start.tv_sec;
usecs = end.tv_usec - start.tv_usec;
mtime = ((secs) *1000* 1000 + usecs) ;
printf("Elapsed time: %ld usecs\n", mtime);
        return 0;
}


OUTPUT:
-------------
> ./a.out
Elapsed time: 10,003,866 usecs

JNI - Calling native functions in Java

This example has been taken from this link -  http://www.science.uva.nl/ict/ossdocs/java/tutorial/native1.1/stepbystep/index.html

Hello world example for native functions :

1.) Compile your Java code using javac.
/*-------------------------------------
Class HelloWorld
--------------------------------------*/
class HelloWorld {
    public native void displayHelloWorld();

    // Declare a native method average() that receives two ints and return a double containing the average
    public native double average(int n1, int n2);

    static {
        //System.loadLibrary("hello");
        // You can specify the full path to the lib file
    System.load("/Users/xyz/Documents/workspace/JNITest/src/libhello.so");
    }
}


/*-------------------------------------
Class JNITest - with main()
--------------------------------------*/
public class JNITest {
public static void main(String[] args) throws InterruptedException {
HelloWorld hw = new HelloWorld();
hw.displayHelloWorld();

Thread.sleep(1000);
System.out.println("Exiting ... "+hw.average(2, 100));
}

}

2.) Generate the c++ header file using the javah command

> javah HelloWorld

3.) Write the c++ native function implementation

#include <jni.h>
#include "HelloWorld.h"
#include <stdio.h>

JNIEXPORT jdouble JNICALL Java_HelloWorld_average(JNIEnv * jenv, jobject jo, jint n1 , jint n2){

return ((jdouble)n1 + n2) / 2.0;
}

JNIEXPORT void JNICALL
Java_HelloWorld_displayHelloWorld(JNIEnv *env, jobject obj)
{
  printf("Hello world!\n");
  return;
}

4.) Compile your c++ code and create a shared library

Note: If you have problems importing <jni.h> on linux box ensure that the include path contains your java's include directories. For example:
g++ -O3 -shared -fPIC mibc.cpp -I /usr/java/default/include/ -I /usr/java/default/include/linux/ -o libhello.so -lrt

5.) Run your java program
Place your shared library in the correct path as specified in the java System.load() call. And run the java program using "java JNITest"

More Reading:
To use different data types for native calls -  https://www3.ntu.edu.sg/home/ehchua/programming/java/JavaNativeInterface.html
JNI performance and overheads -
http://thinkingandcomputing.com/2014/03/30/eliminating-jni-overhead/
http://a-hackers-craic.blogspot.in/2012/03/jni-overheads.html
http://stackoverflow.com/questions/6175209/low-latency-ipc-between-c-and-java
http://stackoverflow.com/questions/7699020/what-makes-jni-calls-slow
http://www.ibm.com/developerworks/java/library/j-jni/
http://normanmaurer.me/blog/2014/01/07/JNI-Performance-Welcome-to-the-dark-side/
http://www.javaworld.com/article/2077554/learn-java/java-tip-54--returning-data-in-reference-arguments-via-jni.html
http://stackoverflow.com/questions/1632367/passing-pointers-between-c-and-java-through-jni

Tuesday, 7 October 2014

Find smallest +ve no. missing from the array

Original link:http://www.careercup.com/question?id=12708671

You are given an unsorted array with both positive and negative elements. You have to find the smallest positive number missing from the array in O(n) time using constant extra space. 
Eg: 
Input = {2, 3, 7, 6, 8, -1, -10, 15} 
Output = 1 

Input = { 2, 3, -7, 6, 8, 1, -10, 15 } 
Output = 4

Solution:
1.) Partition the array into the values smaller than zero and +ve integers using the Quicksort partition method. This can be done in O(n) time and in the same array.
2.) Now you the index of element '0' or first +ve number, say this is idxZ.
3.) Traverse starting from idxZ and set the sign bit of the value at position (idxZ + A[idxZ]) to 1. You do this only in case the value of position lies in the bounds of array.
4.) Now starting at 'idxZ' find the first position where the sign bit is UNSET. This gives you the number you are looking for.

Friday, 13 June 2014

Read an HDFS file functional way in scala

This example reads an HDFS file in scala in a functional manner. We use Stream class to read data lazily when required.

val path = new Path("/data/abc.csv")
val conf = new Configuration()
val fileSystem = FileSystem.get(conf)
val stream = fileSystem.open(path)

// Important to make this def, bcoz if we make it val the memory might bloat up as it keeps the old
// values in the stream as well
def readLines = Stream.cons(stream.readLine, Stream.continually( stream.readLine))

readLines.takeWhile(_ != null).foreach(line => println(line))

Tuesday, 4 March 2014

Learning Scala

Reading : http://twitter.github.io/scala_school/

def f1(x : Int) : Int = x+1

def fact( z: Int) : Int = z* fact(z-1)

Anonymous functions
-------------------
scala> val addOne = (x: Int) => x + 1
addOne: (Int) => Int = <function1>

scala> addOne(1)
res4: Int = 2

def timesTwo(i: Int): Int = {
  println("hello world")
  i * 2
}

Partial application
-------------------

scala> def adder(m: Int, n: Int) = m + n
adder: (m: Int,n: Int)Int

scala> val add2 = adder(2, _:Int)
add2: (Int) => Int = <function1>

scala> add2(3)
res50: Int = 5

Curried functions
-----------------

scala> def multiply(m: Int)(n: Int): Int = m * n
multiply: (m: Int)(n: Int)Int


You can take any function of multiple arguments and curry it.

scala> val curriedAdd = (adder _).curried
curriedAdd: Int => (Int => Int) = <function1>

scala> val addTwo = curriedAdd(2)
addTwo: Int => Int = <function1>

scala> addTwo(4)
res22: Int = 6


Variable length arguments
-------------------------

def capitalizeAll(args: String*) = {
  args.map { arg =>
    arg.capitalize
  }
}

scala> capitalizeAll("rarity", "applejack")
res2: Seq[String] = ArrayBuffer(Rarity, Applejack)


Classes
-------

scala> class Calculator {
     |   val brand: String = "HP"
     |   def add(m: Int, n: Int): Int = m + n
     | }
defined class Calculator

scala> val calc = new Calculator
calc: Calculator = Calculator@e75a11



http://jim-mcbeath.blogspot.in/2009/05/scala-functions-vs-methods.html

Friday, 27 December 2013

Gunzip the *.gz files on hadoop HDFS

If you have some gzipped files (*.gz) on your HDFS and you don't want to bring them on local disk for unzipping you can do it as follows:

hadoop dfs -ls /data/7days/netflow/2013/11/15/*/* | grep -i gz | awk '{print "hadoop dfs -cat "$8"  | gunzip | hadoop dfs -put - "substr($8,0,length($8)-3)}'

Sunday, 29 September 2013

CSS Positioning


  • The display property can be set to:
    • block - takes up entire width of the html page and does NOT let any other elements sit next to it.
    • inline-block - allows other elements to sit next to it self in the same line
    • inline - allows elements to sit in the same line. Useful only for block elements like <p>, as otherwise the element loses its dimensions
    • none - the element is not displayed.
  • To place an element in the center of the page use "margin:auto" as the style.
  • We can use negative margin/padding to move element off the page as well.
  • When you float an element on the page, you're telling the webpage: "I'm about to tell you where to put this element, but you have to put it into the flow of other elements." This means that if you have several elements all floating, they all know the others are there and don't land on top of each other.
  • If you tell an element to clear: left, it will immediately move below any floating elements on the left side of the page; it can also clear elements on the right. If you tell it to clear: both, it will get out of the way of elements floating on the left and right!
  • If you don't specify an element's positioning type, it defaults to static. This just means "where the element would normally go." If you don't tell an element how to position itself, it just plunks itself down in the document.
  • The first type of positioning is absolute positioning. When an element is set to position: absolute, it's then positioned in relation to the first parent element it has that doesn't have position:static. If there's no such element, the element with position: absolute gets positioned relative to <html>.
  • Relative positioning is more straightforward: it tells the element to move relative to where it would have landed if it just had the default static positioning.
  • Finally, fixed positioning anchors an element to the browser window—you can think of it as gluing the element to the screen. If you scroll up and down, the fixed element stays put even as other elements scroll past.


Wednesday, 25 September 2013

Setting up Django Web-app on amazon linux - AWS micro instance


  • SSH into your linux AWS system using a command like this: 

chmod 400 ~/pvtkey.pem
ssh -i ~/pvtkey.pem ec2-user@<AWS-instance-public-IP>
  • Install Apache httpd server:
sudo yum install httpd
sudo /etc/init.d/httpd start OR service httpd start
sudo chkconfig httpd on
  • You can check what is installed with RPM

rpm -qa

  • Install Django:

wget https://www.djangoproject.com/m/releases/1.5/Django-1.5.4.tar.gz
tar xzvf Django-1.5.4.tar.gz

cd Django-1.5.4

sudo python setup.py  install
  • Install mod_wsgi:
sudo yum install mod_wsgi
  • Add a new user for django:
sudo useradd djangouser
su - djangouser
  • Edit http.conf file:
sudo vi /etc/httpd/conf/httpd.conf

NameVirtualHost *:80

<VirtualHost *:80>
WSGIDaemonProcess ec2-54-200-XXX-XXX.us-west-2.compute.amazonaws.com user=djangouser group=djangouser processes=5 threads=1
WSGIProcessGroup ec2-54-200-XXX-XXX.us-west-2.compute.amazonaws.com

    DocumentRoot /home/djangouser/web-app
    ServerName ec2-54-200-XXX-XXX.us-west-2.compute.amazonaws.com
    ErrorLog /home/djangouser/web-app/apache/logs/error.log
    CustomLog /home/djangouser/web-app/apache/logs/access.log combined
    WSGIScriptAlias / /home/djangouser/web-app/apache/django.wsgi

    <Directory /home/djangouser/web-app/apache>
        Order deny,allow
        Allow from all
    </Directory>

    <Directory /home/djangouser/web-app/templates>
        Order deny,allow
        Allow from all
    </Directory>

    <Directory /home/djangouser/web-app/bmdata/static>
        Order deny,allow
        Allow from all
    </Directory>

    <Directory /usr/lib/python2.6/site-packages/django/contrib/admin/static/admin/>
        Order deny,allow
        Allow from all
    </Directory>

    LogLevel warn

    Alias /static/admin/ /usr/lib/python2.6/site-packages/django/contrib/admin/static/admin/
    Alias /static/ /home/djangouser/web-app/bmdata/static/
</VirtualHost>

WSGISocketPrefix /home/djangouser/web-app/apache/run/

  • Add django.wsgi 
import os, sys
sys.path.append('/home/djangouser/web-app')
os.environ['DJANGO_SETTINGS_MODULE'] = 'BMonitor.settings'
import django.core.handlers.wsgi

application = django.core.handlers.wsgi.WSGIHandler()

  • Installing python libs for matplotlib and numpy on AWS
sudo yum install  gcc-c++
sudo yum install  gcc-gfortran
sudo yum install python-devel
sudo yum install atlas-sse3-devel
sudo yum install lapack-devel
sudo yum install libpng-devel
sudo yum install freetype-devel
sudo yum install zlib-devel

tar xzvf matplotlib-1.3.1.tar.gz
cd matplotlib-1.3.1

sudo python setup.py build
sudo python setup.py install