Lateral Thinking, Analytical Mathematics and Computing

Once, Carl Friedrich Gauss said, "Mathematics is the queen of sciences and number theory is the queen of mathematics." Mathematics is a tool to prove our analogy. Here I post some of my collections that emphasis significance of mathematics in real life. Posts also include questions on computing, mathematics, Free-Open-Source-Software, Algorithms, Data Structures and computers in general. All thoughts/opinions are mine, not related to my employer.

Saturday, October 29, 2011

Online Median Tracing Algorithm

Given a stream of integers (flow read from disk or some other resource, like a sensor counting some events). How can we find the median of elements read so far?

It is an interesting online algorithm. Online, we mean the algorithm that can output result after any instance of input processing. For example, after reading i-elements we should be able to tell the median of i-elements read till now.

I read the question on Gulli blog, and came up with some solution. I have posted the algorithm on Geeksforgeeks.

http://www.geeksforgeeks.org/archives/14873

Again, it is just for my reference of link. Any comments, please post on Geeksforgeeks.

Saturday, August 13, 2011

Operating Systems - A Fun

August 13, 2011
I read “operating systems” during my academics, and fortunately got an opportunity to work on a commercial OS development for embedded systems. The following list of resources will help to understand the OS internals,

1. Minix – the foundation behind Linux implementation. The source code is available from the book Operating Systems Design and Implementation, 3e, by Tanenbaum and Woodhull.

2. Micro C OS - II and Micro C OS – III. If you want simple OS implementation for embedded system, it is the book for you. The recent updated kernel (µCOS - III) supports virtual memory management. Code is free for educational purpose, get from “http://micrium.com/page/home”. The story behind Micro C OS is interesting, an example of what passion and need can do, read the story in the book, the same was available on their website.

3. eCos is another RTOS written in C++. However, it seems it is now defunct.

4. RTEMS – The first (IMHO) open source kernel implementing rate monotonic scheduling. Written in C++.

5. pintos – created by a Stanford student, available for free. Check their website.

The above links help to navigate real code. But for the concepts behind the code, we need to consult relevant books or papers.

For example, read this paper for Linux O(1) scheduler implementation, you need to be aware of priority arrays, complexity basics, etc. Similarly, for practical use of linked lists in Linux read Robert Love book.

Saturday, July 30, 2011

Decision Trees - Fake Coin

30-July-2011

I have pending post from few months. It is on application of decision trees to analyze algorithms.

The puzzle is given below,

"We are provided a two pan fair balance and N identically looking coins, out of which only one coin may be defective. How can we trace which coin, if any, is odd one, and also determine whether it is lighter or heavier in minimum number of trials in the worst case?"

Try to solve the puzzle on your own, and then read my systematic solution on the following link

http://geeksforgeeks.org/?p=9411".

There are other applications of decision trees, like analyzing comparison based sorting routines. We may think that comparison is cheap operation, however, in reality we deal with large objects than simple integers, where comparison is costly operation. We need decision trees to analyze lower bounds of comparison sorting.

There are other advantages of trees like solving game of Nim, probability problems, etc...

Again the link is for my reference. Any comments post on Geeksforgeeks.org site.

Monday, June 6, 2011

Text Search Engine

6-June-2011

Given huge text file or collection of documents. How will you search for a word in that collection?

Lucene is an open source library that indexes text for efficient information retrieval. Here is the project home page. Lucene can be effectively used to index words in any document format, that makes it flexible to adopt web applications. The story behind Lucene is interesting to read. The name Lucene is the name of better-half of the creator of Lucene software, named Dogg Cutting.

Lucene was originally implemented in Java, and there are port to other languages like CLucene in C++. Lucene can be used as information retrieval on small sites.

The Qt library includes Lucene port to Qt C++ under 3party libraries. We can experiment with the Qt code for information retrieval on small documents.

Saturday, May 7, 2011

Connecting Capacitances - Project Euler Problem#155

8 - May - 2011

I came across an interesting problem. Given N capacitors (or inductors, resistors) of same value say C (unit is Farad). How many difference capacitances can be derived from these N capacitors connected in series, parallel and series-parallel combination.

For example if N = 1, only one way it can be connected. If N = 2, they can be connected in two different ways of series and parallel. If N = 3, they can be connected in 7 different ways. That means we can derive 7 different capacitance values from given 3 capacitors of same value. These 7 combinations can be found pictorially on the Project Euler page.

Can we generalize it? Yes. Say the total possible capacitances are represented as S(n). If we observe, only the difference between S(n-1) and S(n-2) will participate using next capacitor. For example to calculate S(n) we need the difference DELTA = S(n-1) - S(n-2), and all combinations upto S(n-1). The new capacitor C(n) can be connected in series and parallel with every combination of DELTA..

With an example it will be clear. We know S(1) = 1, S(2) = 3 and S(3) = 7. To calculate S(4) we need DELTA = S(3) - S(2) = 4. We can connect the 4-th capacitor in series and parallel with these 4 combinations, overall it adds 4 * 2 = 8 more combination to the existing 7 combination upto S(3). So, clearly S(4) = 2 x DELTA + S(3) = 15. There is a recursive formula. It is evident from the recursion tree given below for S(5).

Solution can be generalized as

S(n) = 2 x (S(n-1) - S(n-2)) + S(n-1)

= 3 x S(n-1) - 2 x S(n-2)

Base case S(1) = 1, S(2) = 3 and S(3) = 7.

We can code above formula using recursion easily.

int totalPossibleCapacitances(int n)

{

static int S[] = {1, 3, 7};

if(n > 3)

// Non trival case

return 3*totalPossibleCapacitances(n-1) - 2*totalPossibleCapacitances(n-2)

else

// Trivial case

return S[n - 1];

}

Complexity: What will be the complexity? If we observe the recursion tree, the problems are overlapped. Similar to the Fibonacci series. The characteristic equation of generalized solution can be written as

x ^ 2 - 3*x + 2 = 0 whose roots are x1 = 1 and x2 = 2. The solution is in the form

S(n) = A(x1)^n + B(x2)^n where A and B are constants that depends on initial conditions. After substitution of S(2) and S(3) we get,

S(n) = 1.2 * (2)^n - 3 which is exponential.

So, the above function runs with exponential complexity. Can we do better? Yes, the well known technique "Dynamic Programming".

If we observe from S(4), every number is depending on past two numbers only. S(n) = 3*S(n-1) - 2*S(n-2), i.e. we need to store only the past two values to find next number. A three element array will be sufficient. Given below pseudo code,

int totalPossibleCapacitances(int n)

{

// For efficiency purpose

// static

int S[] = {1, 3, 7};

if(n > 3)

{

// Non trivial

int activeIndex = 0;

for(int i = 4; i <= n; i++)

{

S[activeIndex] = 3*S[ (activeIndex + 2) % 3 ] - 2*S[ (activeIndex + 1) % 3 ];

activeIndex++;

activeIndex %= 3;

}

if(activeIndex)

activeIndex--;

return S[activeIndex];

}

else

{

// Trivial

return (n > 0 ? S[n - 1] : -1);

}

Fun to code. The same applies to Inductors and Resistors. The series appears as 1, 3, 7, 15, 31, 63, 127, 255, ... which is one less than powers of 2. Ofcourse generating this series is quite easy. The emphasis is on the recursion tree, and dynamic programming.

Monday, May 2, 2011

Facebook - Friends Search

3 - May - 2011

When you type a name in friends list, the list automatically displays those filtered names containing typed-in characters. What data structure suites well in such situation? I will provide few links for the same in JavaScript.

Yet to update.

Sunday, May 1, 2011

Next Higher Number with Same Number of Set Bits

1 - May - 2011

Given a number say, N - an integer. Find the next larger number with same set (logic 1) number of bits.

For example consider N = 156 which is 10011100 in binary. It has 4 bits set to logic 1. The next immediate number with 4 bits set to logic 1 is 163.

Try out the solution and compare it with my post on Geeksforgeeks. Again I am posting it as a reference to Geeksforgeeks link.

If any clarification needed, post your comments there. They will be helpful to many engineers visiting Geeksforgeeks.

Friday, April 22, 2011

Opaque Pointer and Binary Compatibility

23, April 2011

Binary Compatibility:
When you compile a small application that prints "hello, world", we are implicitly depending on third party libraries usually developed by compiler vendor. If there is bug, or upgraded software from the vendor, can we run our application without recompiling? If yes, we say that the library is binary compatible.

Or in-other words, let us say we supplied a library, and later we come up with great ideas that requires changes in many classes layouts, how can you ship the upgraded libraries without recompiling client code? The solution is Handle/Body idiom (refer Advanced C++ book by Jim Copelin).

Binary compatibility depends on various factors Application Binary Interface definition, source compatibility opaque pointer in code.

Opaque Pointer (Also referred as Handle/Body Idiom, Pimpl Idiom, Cheshire Cat etc.):

To achieve binary compatibility we separate implementation from interface, so easy to say? The interface will have an indirection (pointer or Handle) to implementation (Body). An excellent article on Qt d-pointer (a.k.a. data pointer) is available on Qt developer page and KDE developer page.

Note that the concept of Opaque Pointer or Pointer to Implementation (PIMPL) is used extensively in many software libraries. I came across PIMPL in mobile software, Qt, Boost, many more. I am preparing an article for GeeksforGeeks, soon it will be published.

Also note the concept of q-pointer (pointer back to Handle class) that references the API class for any functionality defined in API class hierarchy.

Layout Engines - WebKit, Gecko, ...

22, April 2011

Layout Engine:

If we want to display an image on screen, how can we do it? We need an entity that paints the screen with the image content. These links helps as digital diary.

The traditional software frameworks like Windows Form, Qt widget, Java Swings etc. provides functionality to render the screen using static information called objects. There are more sophisticated rendering engines that are part of web browsers. The concept of layout engine came from early development of web (Netscape navigator).

Layout engine is the heart of any web browser; examples are WebKit (Chrome, Safari, Konqueror browsers) Gecko (Firefox, Epic browsers), Presto (Opera browser), Trident (Internet Explorer browser) etc. are few examples of well-known layout engines. Almost all of these layout engines are written in advanced C++.

The concept of layout engines is not limited to browsers. One can use it to design applications that needs screen rendering. For example, the outlook mail client, Winamp player display, RealPlayer displays are based on Microsoft propriety Trident engine. Gecko is also used in Thinderbird email client, Google Picasa, and other Internet application suites.

Examples of Webkit layout engine other than browser application can be found in Qt example code. The Qt documentation application is based on Webkit layout engine. It is interesting to imagine the applications of these layout engines. If we are handy with the engine design and few web standards, we can adopt the engine alone for variety of applications.

With the advancement of Cloud Computing Technology, the computer geeks need a thin client application that connects to a server in cloud. Brower based thin clients are going to play major role in future. Chrome design (see Webkit2 also) and development is one example. IE9 also very interesting. Time will answer well.

Script Engine:

Well, the layout engine itself can’t help much. It should be flexible to program for dynamism. The client side scripting languages like JavaScript play a critical role here. But one needs to parse these scripts in the browser at runtime. The parsing entity is called Script Engine. For more details read about Chakra and V8 script engines. Firefox uses optimized SpiderMonkey as script engine. The performance of script engine contributes to overall performance of browser considerably.

For details on Gecko see this link. For details of WebKit see this link. WebKit wiki. The code is free, start exploring their design and implementation. We can see the practical use of C++, design patterns, advanced C++ idioms, etc.

Friday, April 15, 2011

Trie – An efficient way to search data

15 April, 2011

Trie – An efficient way to search data

Trie is an information retrieval data structure efficiently extracts required data. For example, see the following,

When we type a name on smart phones it displays all the contacts containing the typed-in string. What data structure will fill such needs?
In advanced editors like Visual Studio, Qt Creator, etc… when we type the identifier name, it automatically displays all the members part of that object. How is it possible?
On Linux (even on emulated DOS in Windows), if we type characters “gcc” + TAB key, it displays all the commands with “gcc” as prefix. What data structures is appropriate here?

All the above cases use a kind of trie data structure. Perhaps, the practical implementation may vary based on kind of application.
There is a demand for tries on Geeksforgeeks. We will be posting few articles on these in coming weeks.

Monday, April 11, 2011

Differences between Semaphores and Mutexes

11, Apr 2011

What are the differences between semaphores and mutexes?. Are binary semaphore and mutex same or just a notational convenience?

It is typical interview question. Surprisingly, we get correct answers rare.

From my learnings during RTOS design, I am preparing an article for Geeksforgeeks (link), it will be published soon.

An interesting question, if you observe Windows word 2003, Adobe PDF reader, only one instance will be running in the task manager. Where as Windows Excel 2003 will create as many processes as number of excel documents opened. How to implement, only one instance of process must be running, invoking another instance should prompt "An instance is already running" message?

Sunday, April 10, 2011

Beautiful Architecture - Worth Reading (Productive Weekend)

10, Apr 2011.

Got new book "Beautiful Architecture". I got the book from local book shop. Good to know that the royalty on the book will be donated to "Doctors Without Boarders".

The book is collection of essays from experts that designed high end, and high availability systems. Yet to finish few chapters, left good learning experience.

Few topics to name,

1. Scalable Architectures

2. Resource Oriented Architectures

3. Data Centric Architectures (Facebook)

4. Virtualization (Xen Hypervisor)

5. Fault Tolerant Operating Systems

6. GNU Emacs and KDE (Qt)

7. Metacircular Virtual Machines

Happy and worthy weekend.

Friday, April 8, 2011

An Interview Experience

8 April, 2011

I don't like to present in interview panels unless really required. Today I had to be in an interview panel. Good puzzle I recapped after long time.

Two players A and B playing a game (similar to game of Nim). Given a pile of N coins, the players need to pick coins from the pile. In each move a player can pick atleast 1 and utmost m coins. The player who makes the last move is the winner.

For example if A starts the game, he can pick 1, 2, 3, ... m coins from the pile. Then B can pick 1, 2, 3, .... m coins from the remaining pile. Next A, followed by B, so on...

In the process, the player who makes last move (picks all remaining coins <= m from the pile) is the winner.

Now, given N = 15, m = 3, and A starts the game with some 1 <= x <= m coins. How many different ways are there that A can win the game?

Hint: Use state space analysis method (i.e. trees).

Pascal’s Identity - Theoretical Significance

April 8, 2011

We know that C (n, r) = C (n – 1, r – 1) + C (n – 1, r)

It is quite easy to prove it mathematically. But what is its significance in real life examples?

C (n, r) is selecting r elements from n elements. Pick a special element k from these n elements, we left with (n – 1) elements.

Now this r elements selection C (n, r) can be grouped into two categories,

A) r elements that contain the element k.

B) r elements that do not contain the element k.

From A) the special element k is in all r selections. Since k is part of r elements, we need to choose remaining (r – 1) elements from remaining (n – 1) elements, this can be done in C (n – 1, r – 1) ways.

From B) the special element k is not there in all r selections. It means, we will have to select all the r elements from available (n – 1) elements (we must exclude element k from n). This can be done in C (n – 1, r) ways.

Now it is evident that, the first term of RHS is combination containing special element k, and the second term is combination without the special element k. Sum of these two is selecting r elements from n elements.

Interesting proof, little confusing to understand, but useful in practice.

Pascal's Identity and Dynamic Programming:

If we note the identity, it is the base for recursive calculation of binomial coefficients. However, when plain recursion is used, it will lead to an exponential algorithm.

Recursion with memoization can result in an efficient algorithm. The idea behind this dynamic programming (programming we mean a method of tabulation) is to tabulate the first occurrence of binomial co-efficients.

Pascal's triangle can serve as the tabulation in the dynamic method. Each co-efficient is sum of coefficients in the previous row. The table can grow till min(k, i) where 0 <= i <= n.

Try the dynamic algorithm.

Note that there is no need of two dimensional array to memoization of partial results. This fact is evident from Pascal's triangle with an extra variable to store nCk.

Tuesday, March 29, 2011

Programming Style - A discipline of writing code

29, March 2011

Discipline is simple, yet powerful word. However, the meaning can vary from person to person (Remember that practice is always different from law :)). The same applies to programming.

When writing code we never exercise importance of meaningful and precise comments. I have seen many code snippets on Geeksforgeeks and other forums. Most of the code snippets posted by readers are missing comments. Readability, maintainability (simplicity), clarity and generality are key to write elegant code. Brevity also key principle of writing code.

Note that geting it right (or code is fine since it is working.... bla bla) are poor reasons to convince ourself. Accept comments if they are adding value.

About 2 years ago, I had written a program in Perl script to extract symbol table from binary executable. It was about 5000 lines. Yesterday I reviewed it again, based on a call from previous teammate. I am proud that the code is well commented in such a way any engineer in future can take it forward. I could able to recap all the data structures, algorithms easily from the comments.

I would recommend to read The Practice of Programming by Kernighan.

Saturday, March 19, 2011

Tournament Tree and Winner Tree

March 19, 2011.

How many minimum comparisons are needed to trace second smallest element in an unsorted array?

We can use a tournament tree approach based on adversary argument to prove minimum number of comparisons.

In tournament method, every number (element) is paired with other number successively to find the maximum number. This forms a tree like structure. We must have N - 1 comparisons to find maximum number.

The information gained while finding max number will be used in reducing the comparisons for second largest. It meas, we would need to consider only those which lost against max number to be considered for second largest.

From the tree structure, we can conclude such lost numbers can't be more than [log(N) - 1] (consider tree height).

Overall, at minimum we need ( N - 1 + log(N) - 1 = N + log(N) - 2 ) comparisons.

Read my post on Geeksforgeeks for details on Winner tree approach.

Wednesday, March 16, 2011

Count number of set bits in an array

There are various techniques to count bits in an integer. These methods can be integrated to count bits in an array. I have published a simple and efficient method on Geeksforgeeks.

In the post, observe the meta program using C/C++ preprocessor that generates code for main program.

Minesweeper program

A nice program written in Qt and Python.

For details see the following link,

http://lionel.textmalaysia.com/minesweeper-in-python-gui.html

Last week KDE India meeting was organised. The above program was explained by KDE geek Dinesh Sai. Thanks to him.

Friday, March 4, 2011

A puzzle (mathematics) - properties of decimal number system

5, March 2011.

I come across one geek blog, and there is one interesting question... "Given a number containing all numbers from 1 through 9, find next higher number with same digits!".

The question should trigger a hint. "Same digits" mean the next higher number will have all same digits (In other words, the difference between N and N_next should be multiple of 9). Since all the digits are same in both the numbers, difference between the two numbers will cancel out, and makes the next higher number increase by a factor which is multiple of 9. Also, since the number is higher one, we should look for increasing patterns from right to left.

For example consider

N = 125786493

We can see 125786439 can't be next higher number. We must look for first three digits. 125786493 should be transformed to 125786934 with a difference of 441 which is multiple of 9. Or simply 49 x 9 (observe the original number).

Another number - 157869324 transformed to 157869342 with difference of 18 = 2 x 9. Obviously, there is a pattern.

Logic is clear, can we write the program easily, ....? Not really! Try out.

Another variant "Find the next higher number with same number of set bits", I had published an article here.

How many levels?

4th, March, 2011

Tree is one interesting data structure in computer science. I like the properties of trees and graphs.

Sometimes we receive some junk mails, containing junk information something like if you don't forward the mail to next 3 people something bad will happen... bla... bla... bla... and some more BS... (these mails usually related to God, even if-at-all God exists, he might be laughing at such mails. Recently I have seen a funny mail, I will post on http://wemeanourthoughts.blogspot.com/ fun to read).

But as an Electronics and Computer Science Engineer, it fascinates me to find how many pages would go for waste because of sense-less people.

We can represent the above mail chain as an m-ary tree. Where m is the branching factor, such as 3 in the above case.

As a simple math, if the mail chain ends at L-th level, how much space would they all occupy in mail servers (ofcourse this is apart from the space on user machines, assume each mail takes few MBs since those mails contain pictures as well)?. It would be in few Giga Bytes of space.

If all the persons at X-level changes the number 3 to 6, how the output will be affected?

Reverse (Synthesis) Question:

Given there are N leaves in a binary tree, how many levels an optimally balanced binary tree can have?

Answer: log₂N + 1

In general, an m-ary tree with N leaves can have log_mN + 1 levels.

Friday, February 25, 2011

Shortest path in connecting all vertices

Given below 4 vertices on a two dimensional plane. Connect them in shortest possible way. In other words, there are 4 cities A, B, C and D at the corner of square of side (a) units. What will be the minimum cost of connecting these cities?

Those making circuit layouts of VLSI chips need to solve the puzzle smartly. Even 0.1% gain in space is huge on miniaturized chips.

Answer: Assuming the side of square as s, the minimum cost (road connecting all four cities) is s x (SQRT(3) + 1)

Tuesday, February 22, 2011

Splitting a binary search tree at a given node

22, Feb 2011
Given a node and root of tree, how to split the binary search tree into two different trees.

The idea is to walk down the tree till the node, and keep track of the path. Climb up the tree and separate the nodes into two different trees. The algorithm is simple.

split(z, T)
   x = z.left
   y = z.right

   splitter = z

   while(z.p is valid)
   if(z.p.left = z) // z is on left of its parent
   z = z.p
   z.left = y
   y.p = z
   y = z
   else // z is on right of its parent
   z = z.p
   z.right = x
   x.p = z
   x = z

   // add splitter if required or delete it
   insert(splitter,x)

   x.p = nil
   y.p = nil

   return x and y as two split trees.

Understanding the algorithm will be simple, if we can draw a tree and its walking as per above pseudo code.

Saturday, February 19, 2011

Combining (merge) two binary search trees

20th, February 2011

Given two binary search trees, how to combine them?

Note that the binary search tree has a property that every left subtree element will be less than or equal to the root element, and every right subtree element is greater than or equal to the root element.

Based on the property, let us assume that given trees T₁and T₂, such that every element in T₁ <= every element in T₂. It will be easy to merge these two trees.

We need to figure out the root of merged tree. The root element can be either right extreme node in T₁or left extreme node in T₂ to maintain BST property. So, the approach is to remove either of these nodes, make it as root, and attach T₁as left subtree, T₂as right subtree.

Thursday, February 10, 2011

Find K-th smallest element in a binary search tree (BST)

11, Feb 2011

The question related to order statistics. A BST (non-linear data structure) represents data in an order fashion that satisfies the binary search tree property of "left node key should be <= root and right node key should be >= root".
Since we need K-th smallest element, we need to maintain # of elements in either of the sub-trees. By maintaining the count we can get the sub tree nodes in O(1) time complexity.

Assume the root has n nodes in its left subtree. If n <= K, we will continue search (recursion) for the smallest element in the left subtree. If K = n + 1, the root has the K-th smallest element. If k > n + 1, then we continue search the right subtree for the (K - n- 1)-th smallest element.

There are other approaches, like inorder traversal by maintaining the count, and stop search at K-th element or trivial case of no K-th node.

I have published two approaches for finding k-th smallest element in BST (See the link).

Thursday, February 3, 2011

QtWebkit - CRASH() macro

I have come across interesting code snippet in QtWebkit code base.

#define CRASH() do { \
*(int *)(uintptr_t)0xbbadbeef = 0; \
((void(*)())0)(); /* More reliable, but doesn't say BBADBEEF */ \
} while(false)

The line,

((void(*)())0)();

caught my attention. Can you guess what is it doing? Pretty simple,

0 being casted to some type and is 'pointer type. It is of type "pointer to function taking nothing and returning nothing", and the function invoked. Ideally the code invoking function located address 0. I interpret it as power OFF simulation. Isn't it?

It is published on GeeksforGeeks.

Wednesday, January 26, 2011

My Tutorials on Geeksforgeeks

27, Jan, 2011

These days blogs are becoming digital diaries. Some times I am questioning myself, am I the person posted such comment/article? I must have another blog exclusively for these links, eventhough we have applications like del.icio.us.

The following are few links

Structure Member Alignment, Padding and Data Packing

Yet to update