Data Structures and Algorithms

28 Apr 2015

Reading time ~6 minutes

Install Jupyter

It is a web app that allows us to write python scripts.It was originally called IPython notebook or IP notebook, now they have changed their name to Jupyter(as they have started to support multiple languages other than just py). These notebooks have the extension .ipynb

Steps for installation:

Go to their official downloads page.
Now run pip3 install jupyter
Now run jupyter notebook in terminal and the notebook opens in your web browser

Usage

We can create new cells, delete cells, copy cells, move cells

Shift+Enter is the shortcut to run the script in the cell

Shift+Tab write after a func will give us docstring/help info about that func

obj. - now if we press tab it shows us the methods that can be called on the obj

We can change the cell type to code/markdown

We can restart a kernel if say a while/for loop goes in infinite mode and we would like to interrupt and restart it

Basic workflow

def solution(num1,num2)
	return num1+num2

from nose.tools import assert_equal

class SolutionTest(object):
	
	def test(self,sol):
		assert_equal(sol(2,2),4)
		assert_equal(sol(4,4),8)
		print("ALL TEST CASES PASSED")

# Run tests
t = SolutionTest()
t.test(solution)

Algorithm analysis

Algorithm is a procedure to solve a problem

def sum1(n):
	final_sum = 0
 
	for x in range(n+1):
		final_sum +=x

	return final_sum

def sum2(n):
	return (n*(n+1)/2)

How can we compare the algorithms? 1. Time 2. Space

We can use the %timeit magic func in jupyter to compare the time taken by functions

Timing the code

Sometimes its important to know how long our code is taking to run, or at least know if a particular line of code is slowing down our entire project. Python has a built-in timing module to do this.

This module provides a simple way to time small bits of Python code. It has both a Command-Line Interface as well as a callable one. It avoids a number of common traps for measuring execution times. Lets learn about timeit!

import timeit

"""
Lets use time it to time various methods of creating the string '0-1-2-3-.....-99'
We'll pass two arguments the actual line we want to test encapsulated as a string and the number of times we wish to run it. Here we'll choose 10,000 runs to get some high enough numbers to comapre various methods
"""

# For loop
timeit.timeit('"-".join(str(n) for n in range(100))', number=10000)
#returns 0.2894139289855957

# List comprehension
timeit.timeit('"-".join([str(n) for n in range(100)])', number=10000)
#returns 0.25658297538757324

# Map()
timeit.timeit('"-".join(map(str, range(100)))', number=10000)
#returns 0.16328215599060059

"""Great! We see a significant time difference by using map()! This is good to know and we shuold keep this in mind.
"""

Now lets see iPython’s magic function %timeit iPython’s %timeit will perform the code in the same line a certain number of times (loops) and will give you the fastest performance time (best of 3).

Lets repeat the above examinations using iPython magic!

%timeit "-".join(str(n) for n in range(100))
# prints this: 10000 loops, best of 3: 23.8 µs per loop

%timeit "-".join([str(n) for n in range(100)])
#prints this: 10000 loops, best of 3: 21.3 µs per loop

%timeit "-".join(map(str, range(100)))
#prints this: 100000 loops, best of 3: 13.5 µs per loop

Great! We arrive at the same conclusion. Its also important to note that iPython will limit the amount of real time it will spend on its timeit procedure. For instance if running 100000 loops took 10 minutes, iPython would automatically reduce the number of loops to something more reasonable like 100 or 1000.

Big-O notation

Describes how quickly runtime will grow relative to the input

O(n) means the runtime grows linearly with the input size

Big-O Examples

# O(1) constant:
def func_constant(values):
	print values[0]


# O(n) linear:
def func_list(list):
	for i in list:
		print(i) 
# This is O(n)

def func_list(list):
	for i in list:
		print(i)
	for i in list:
		print(i)
#This is O(2n) and not O(n^2), but when n -> infinity
#O(n) = O(2n), so we can drop the constant 2 and both the algorithms have almost the same performance. This is called as dropping the insignificant constants or terms. 


# O(n^2) quadratic:
def func_quad(lst):
	for item1 in lst:
		for item2 in lst:
			print(item1,item2)

Best case, Worst case, Avg case

def matcher(lst,match):
	
	for item1 in lst:
		if item1 == match:
			return True
	return False

lst = [1,2,3,4,5,6,7,8,9,10]

matcher(lst,1) #Best case O(1), match happens first time

matcher(lst,11) #Worst case O(n), match happens after n iterations

Array sequences

Dynamic Arrays:

The array in python is dynamic in nature. We dont have to specify the size taken by the array in advance. We can keep growing it as ans when needed.

They are similar to hermit crabs which hunt for bigger shells/homes when they grow in size and are in need of it.

Anagram problem

If 2 strings can be written with the same letters then they are anagrams. Eg: “Clint Eastwood”, “old west action” Go given 2 strings check if they are anagrams?(Ignore case)

If each letters frequency are the same in both the strings then thay are anagrams of each other, ie if 2 string are equal to each other after sorting then they are anagrams.

def anagram2(s1,s2):
	#remove whitespace and turn to lower case
	s1 = s1.replace(" ","").lower()
	s1 = s1.replace(" ","").lower()

   return sorted(s1) == sorted(s2)

#Lets solve this without sorting func manually

def anagram(s1,s2):
	#remove whitespace and turn to lower case
	s1 = s1.replace(" ","").lower()
	s1 = s1.replace(" ","").lower()

	#if the no of letters are not same they we can immediately say they #are not anagrams
	if len(s1) != len(s2):
		return False

	#Now lets use a dictionary to count the frequency of each letter 
	count = {}

	for letter in str:
		if letter in count:
			count[letter] += 1
		else:
			count[letter] = 1
	# So lets loop through each letter in str2 and sub 1 for each letter 
	# Finally if all the values in the dict are zero then anagrams

	for letter in s2:
		if letter in count:
			count[letter] -=1
		else
			count[letter] = 1
			#set this to 1 so that the next for loop retuens False

	for k in count:
		if count[k]!=0:
			return False

	return True

Array Sum problem

Given an integer array, output all the unique pairs that sum up to a specific value k. So the input: pair_sum([1,3,2,2],4), would return 2 pairs:(1,3) (2,2)

NOTE: FOR TESTING PURPOSES, CHANGE YOUR FUNCTION SO IT OUTPUTS THE NUMBER OF PAIRS

Solution: The O(N) algorithm uses the set data structure. We perform a linear pass from the beginning and for each element we check whether k-element is in the set of seen numbers. If it is, then we found a pair of sum k and add it to the output. If not, this element doesn’t belong to a pair yet, and we add it to the set of seen elements. The algorithm is really simple once we figure out using a set. The complexity is O(N) because we do a single linear scan of the array, and for each element we just check whether the corresponding number to form a pair is in the set or add the current element to the set. Insert and find operations of a set are both average O(1), so the algorithm is O(N) in total.

def pair_sum(arr,k):
    
    if len(arr)<2:
        return
    
    # Sets for tracking
    seen = set()
    output = set()
    
    # For every number in array
    for num in arr:
        
        # Set target difference
        target = k-num
        
        # Add it to set if target hasn't been seen
        if target not in seen:
            seen.add(num)
        
        else:
            # Add a tuple with the corresponding pair
            output.add( (min(num,target),  max(num,target)) )
    
    
    # FOR TESTING
    return len(output)
    # Nice one-liner for printing output
    #return '\n'.join(map(str,list(output)))

"""
RUN THIS CELL TO TEST YOUR SOLUTION
"""
from nose.tools import assert_equal

class TestPair(object):
    
    def test(self,sol):
        assert_equal(sol([1,9,2,8,3,7,4,6,5,5,13,14,11,13,-1],10),6)
        assert_equal(sol([1,2,3,1],3),1)
        assert_equal(sol([1,3,2,2],4),2)
        print 'ALL TEST CASES PASSED'
        
#Run tests
t = TestPair()
t.test(pair_sum)

datastructures algorithms O(n)analysis