Sunday, 17 February 2013

Understanding open source architectures I



Apache Mahout

Collaborative filtering (CF) is a technique used by some recommender systems. Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue x than to have the opinion on x of a person chosen randomly.

Currently Mahout supports mainly four use cases: 

  1. Recommendation mining takes users' behavior and from that tries to find items users might like. 
  2. Clustering takes e.g. text documents and groups them into groups of topically related documents. 
  3. Classification learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. 
  4. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.

Best command line tools for Linux, MAC

Building a search engine based on Inverted indexes

Monday, 4 February 2013

Using multi-dimensional associative arrays in AWK


cat /tmp/33 | awk -F'[\\^\\[\\]]' '{c[$5,$6]+=1}END{for( i in c) {split(i,sep,SUBSEP); print sep[1],sep[2], c[i] ;}}' 


Sunday, 3 February 2013

Difference between ClassNotFoundException and NoClassDefFoundError

The difference from the Java API Specifications is as follows.
Thrown when an application tries to load in a class through its string name using:
  • The forName method in class Class.
  • The findSystemClass method in class ClassLoader.
  • The loadClass method in class ClassLoader.
but no definition for the class with the specified name could be found.
Thrown if the Java Virtual Machine or a ClassLoader instance tries to load in the definition of a class (as part of a normal method call or as part of creating a new instance using the new expression) and no definition of the class could be found.
The searched-for class definition existed when the currently executing class was compiled, but the definition can no longer be found.
So, it appears that the NoClassDefFoundError occurs when the source was successfully compiled, but at runtime, the required class files were not found. This may be something that can happen in the distribution or production of JAR files, where not all the required class files were included.
As for ClassNotFoundException, it appears that it may stem from trying to make reflective calls to classes at runtime, but the classes the program is trying to call is does not exist.
The difference between the two is that one is an Error and the other is an Exception. WithNoClassDefFoundError is an Error and it arises from the Java Virtual Machine having problems finding a class it expected to find. A program that was expected to work at compile-time can't run because of class files not being found, or is not the same as was produced or encountered at compile-time. This is a pretty critical error, as the program cannot be initiated by the JVM.
On the other hand, the ClassNotFoundException is an Exception, so it is somewhat expected, and is something that is recoverable. Using reflection is can be error-prone (as there is some expectations that things may not go as expected. There is no compile-time check to see that all the required classes exist, so any problems with finding the desired classes will appear at runtime.