Step by step Kaggle competition tutorial

A clean and concise guide to Data Science.

Datanice

Kaggle is a Data Science community where thousands of Data Scientists compete to solve complex data problems.

In this article we are going to see how to go through a Kaggle competition step by step.

The contest explored here is the San Francisco Crime Classification contest. The goal is to classify a crime occurrence knowing the time and place it happened.

Screenshot from 2016-04-10 10:06:15.png

View original post 916 more words

Advertisements

3. Reverse Bits

Question : Write a program that takes a 64-bit word, output the reverse of that word.
I would urge you to minimize the window and try to solve this question on your own first.

 

Method – 1 Brute Force Solution
Simply apply the bitswap method we used earlier on all the extreme bits.

int bitSwap(int num, int i, int j){
	if( ( ( num>>i ) & 1 ) != ( ( num>>j ) & 1 ) ){
		int bitMask = (1<<i) | (1<<j);
		num ^=bitMask;
	}
	return num;
}

int reverseBits(int num){
	if(num!=0){
		int noOfBits=log2(num)+1;
		for(int i=0, j=noOfBits-1; i<j ; i++,j--){
			num=bitSwap(num,i,j);
		}
	}
	return num;
}

Time Complexity : O(k), where k is the number of bits in the number.
Space Complexity : O(1).

Method -2 Cache!
Just like the parity bit solution we can cache the solution of smaller bits (say 16) and combine the results.

void precompute (void){
	for (int i=0;i<=int(pow(2,(sizeOfInt/4))-1);i++){
		Table[i]=reverseBits(i);
	//	cout<<i<<' '<<Table[i]<<endl;
	}

	return ;
}
int reverseBitsUsingTable(int reverseOf, int wordSize, int bitMask){
	if(reverseOf<(1<<sizeOfInt/4))
		return Table[reverseOf];

	if(reverseOf<(1<<2*sizeOfInt/4)) 		return  Table[(reverseOf >> (wordSize))&bitMask] |
		Table[(reverseOf & bitMask)]<<(wordSize);

	if(reverseOf<(1<<3*sizeOfInt/4)) 		return Table[(reverseOf >> (2*wordSize))&bitMask] |
		Table[(reverseOf >> (wordSize))&bitMask]<<(wordSize) |
		Table[(reverseOf & bitMask)]<<(2*wordSize); 	return  Table[((reverseOf >> (3*wordSize)))&bitMask] |
	Table[(reverseOf >> (2*wordSize))&bitMask]<<(wordSize) | 	Table[(reverseOf >> (wordSize))&bitMask]<<(2*wordSize) |
	Table[(reverseOf & bitMask)]<<(3*wordSize);

}

The Time Complexity : O(n/L + 2^L), where n is the number of bits in the word and L is the width of the word for which we cache the results. 2^L time will be taken to prepare the table, if L is known beforehand we can remove it from the notation, as we only talk about the growth of function in Big-Oh Notation.
The Space Complexity : O(2^L), only a constant amount of extra space required.

The full working code is available at Github, you can star the repository to get updates. You can also follow this blog for more such posts. Comment in your queries, report bugs, discuss!:)

Standard Template Library : An Introduction

What is the Standard Template Library(STL) ?
The Standard Template Library (STL) is a software library for the C++ programming language. It provides four components called containers, iterators, algorithms and functional. STL library lies in the std namespace.

Why do we need to use the STL?
1. There is no use in inventing a wheel that has been in presence for decades.
2. STL in C++ provides humongous number of ready-made common classes, such as containers that can be used in any built and user defined type, that supports elementary operations like (such as copying and assignment).
*You should know how to write these data structures yourself. This knowledge will help you choose the best STL data structure required for your need.

Containers :
Containers are objects that store data. Inbuilt implementations of various abstract data types are available in STL.

The container manages the storage space that is allocated for its elements and provides member functions to access them, either directly or through iterators (objects with similar properties to pointers).

Most containers have at least several member functions in common, and share functionalities. Which container is the best for the particular application depends not only on the offered functionality, but also on its efficiency for different workloads.

Header file :- In most of the cases, header files of the container have their name of the of form, <containerName>. For instance the header files of vector, stack, deque are <vector>, <stack>, <deque> respectively.

Iterators :
An iterator is any object that, pointing to some element in a range of elements (such as an array or a container), has the ability to iterate through the elements of that range using a set of operators (with at least the increment (++) and dereference (*) operators).

Header file :- <iterator>
There are 5 types of iterators :-
InputIterator 
OutputIterator
ForwardIterator
BidirectionalIterator
RandomIterator.

Algorithms :
The algorithms library defines functions for a variety of purposes (e.g. searching, sorting, counting, manipulating) that operate on ranges of elements. Note that a range is defined as [first, last) where last refers to the element past the last element to inspect or modify. The exhaustive list of functions available in algorithm can be read from cplusplus reference.

Header file :- <algorithm>

Functional :
Functional provides a set of predefined class templates for function objects, including operations for arithmetic, comparisons, and logic.

Header file :- <functional>

So now that you are aware of the basic STL terminology in the coming posts we will go through the features of the most commonly used containers one by one. Don’t forget to subscribe, love, code and discuss. 🙂

Numbers and recipes at Codenmath.com

A much awaited day, a long lost dream, finally comes to life with codenmath.com, a wish I had for 9 years now; has awaken only to become a reality. At last a place where I can share all my knowledge, of whatever I have. A belly that has been so full of words will finally get some peace.

I would be sharing my views on numbers, scripts, programs, movies and maybe eh humans! To start of, my favorite number is 28, the only number known to mankind that can be represented as a sum of first few prime numbers(2,3,5,7,11), first few composite numbers(4,6,8,10) and as the sum of its own proper factors(1,2,4,7,14). Fascinating.

Behold! fasten your seat-belts for the ride that is about to come! Also let us know your favorite number. 🙂

2. Swap bits

Question : Given a number and 2 indices return the number after swapping bits present at those indices.

Method-1 : Brute-Force
We can swap bits by following the given steps :-
1. Using bitmasks to extract the ith and jth bits.
2. Save the extracted bits to some local variables.
3. Write the jth  bit to the ith position and vice-versa.

int bitSwapBruteForce(int x, int i, int j){
	int temp1= ((x & (1<<i)) !=0);
	int temp2= ((x & (1<<j)) !=0);
	x = temp1==1 ? (x | ( temp1<<j) ) : (x & ~(1<<j));
	x = temp2==1 ? (x | ( temp2<<i) ) : (x & ~(1<<i));
	return x;
}

The Time Complexity : O(1), only a constant amount of operations are required.
The Space Complexity : O(1), only a constant amount of extra space required.

Method-2 : Oh! Its beauty.
Since a bit can take only two values we can do better, lets swap only when bits differ. We’ll use the magical XOR once again.


int bitSwapSuccinct(int num, int i, int j){
	if( ( ( num>>i ) & 1 ) != ( ( num>>j ) & 1 ) ){
		int bitMask = (1<<i) | (1<<j);
		num ^=bitMask;
	}
	return num;
}

As ith and jth bits of num differ, only 1 of them would be 1, resulting in bitMask having only one 1 in its bit representation. Now the XOR would swap the bits with help of bitMask.

For num=(1110)2, i=0, j=2;
bitMask=(1)2 | (100)2 = (101)2
num = (1110)2 XOR (0101)2 = (1011)2

The Time Complexity : O(1), only a constant amount of operations are required.
The Space Complexity : O(1), only a constant amount of extra space required.

The full working code is available at Github, you can star the repository to get updates. You can also follow this blog for more such posts. Comment in your queries, report bugs, discuss!:)

1. Parity of a Word

Question : Compute the parity of a 64-bit word.
Parity of a number refers to whether it contains an odd or an even number of 1’s in its binary representation. The parity of a number is 1 if it contains odd number of 1’s, else 0.

Method-1 : Brute Force
We can iteratively test the value of each bit while tracking the number of 1’s seen so far.

int Parity (long x){
	int result=0;
	while(x){
		result+=(x&1);
		x>>1;
	}
	return result%2;
}

 

The Time Complexity : O(n), where n is the number of bits in the word.
The Space Complexity : O(1), only a constant amount of extra space required.

Method-2 : A better Approach
We can apply the method that is used to compute the number of 1’s in the bit representation of a number.

int Parity (int x){
	int result=0;
	while(x){
		x=x&(x-1); // removes the lowest set bit.
		result^=1;
	}
	return result;
}

x-1 contains last set bit of x as unset, hence there AND unsets that bit.
The Time Complexity : O(k), where k is the number of SET bits.
The Space Complexity : O(1), only a constant amount of extra space is required.

Method-3: CACHE!
The original question asks us the compute the parity of a 64 bit number, previous methods are sufficiently fast to compute parity for such numbers, though if the computation has to take place for many numbers it is a better idea to cache our solutions and use them when required. Caching 2^64(9223372036854775808) numbers? Ah! does not look like a good idea. Better we can cache answers to first 2^16 numbers and then compute solution for larger numbers by combining the solutions of smaller segments. For a 64 bit number we’ll break the number into 4 segments of 16 bits. We’ll use the table to compute parities of all the 4 segments and then XOR the values to compute the final answer.

void precompute (void){
	for (int i=0;i<=int(pow(2,(sizeOfInt/4))-1);i++){
		Table[i]=parity(i);
		//cout<<i<<' '<<Table[i]<<endl; Uncomment this to check for yourself
	}
	return ;
}

 

int parityAns( int ParityOf, int BitMask, int wordSize){
	return ( Table [ParityOf>>( 3 * wordSize)]
	^ Table[( ParityOf >>  ( 2 * wordSize ) ) & BitMask]
	^ Table[( ParityOf >> ( wordSize ) ) & BitMask]
	^ Table[ ParityOf & BitMask] );
}


The Time Complexity : O(n/L + 2^L)
, where n is the number of bits in the word and L is the width of the word for which we cache the results. 2^L time will be taken to prepare the table, if L is known beforehand we can remove it from the notation, as we only talk about the growth of function in Big-Oh Notation.
The Space Complexity : O(2^L), only a constant amount of extra space required.

Method-4 The Final Blow
We can use the property of XOR to compute the parity.
say X consists of 4 bits(a,b,c,d), we ran the following steps on it.

X =        a     b     c     d
X=X^X>>1 = a    a^b   b^c   c^d
X=X^X>>2 = a    a^b  a^b^c a^b^c^d

The last bit contains the parity which we can extract by ANDing X with 1.

int parity(int ParityOf){
	int factorOfShift = sizeOfInt/2;
	while(factorOfShift){
		ParityOf^=(ParityOf>>factorOfShift);
		factorOfShift=factorOfShift/2;
	}
	return ParityOf & 0x1;
}

The Time Complexity : O(log n), where n is the number of bits in the word.
The Space Complexity : O(1), only a constant amount of extra space required.

Also we can combine the last 2 methods, the full working code to which is available at Github, you can star the repository to get updates. You can also follow this blog for more such posts. Comment in your queries, report bugs, discuss! 🙂

A series on Programming Interview Questions

I have decided to write complete solutions and analysis of the questions that are frequently asked in programming interviews at major tech. companies for the position of Software Engineer/Software Engineering Intern.

This series will contain exhaustive list of solutions for all the problems posed in The Elements Of Programming Interviews, I will also maintain a Github repository for all the codes. They can be accessed directly using this EPI link.