Dynamic programming

Leave a comment

Dynamic programming is a method for solving a complex problem by breaking a big problem down into several simpler smaller ones. Take this shortest path problem for example.

counting-paths.PNG

The goal here is to count how many paths there are from the upper left hand corner of the grid down to the bottom right. The rule is you can only continuously move down and to the right.

A dynamic programming approach to this problem would be to note that the number of paths from the start to the end can be broken down by summing the number of paths from A to the end, also with B to the end. So now we have broken down our big problem, into smaller sub problems.

recusion1.PNG

That that these subproblems can then be further broken down into as we walk through the grid.

recursion2.PNG

This recursion stops when we get to the end and the number new paths left is 1.

One way of translating this into a recursive algorithm would be to write something like this.

With code like this

DynamicProgrammingRecursive.java

public class DynamicProgrammingRecursive {

    private int grid[][];

    public DynamicProgrammingRecursive(int[][] grid) {
        this.grid = grid;
    }

    public int countPaths(int row, int col) {
        if (!isValidSquare(row, col)) return 0;
        if (isAtEnd(row, col)) return 1;
        return countPaths(row + 1, col) + countPaths(row, col + 1);
    }

    public boolean isValidSquare(int row, int col) {
        return isInBounds(row, col) && !isBlocked(row, col);
    }

    public boolean isBlocked(int row, int col) {
        return this.grid[row][col] == 1;
    }

    public boolean isInBounds(int row, int col) {
        return (row < grid.length && col < grid[0].length);
    }

    public boolean isAtEnd(int row, int col) {
        return grid.length - 1 == row && grid[row].length - 1 == col;
    }

}

foo

import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;

public class DynamicProgrammingRecursiveTest {

    private DynamicProgrammingIterative empty2x2;
    private DynamicProgrammingIterative empty3x3;

    @Before
    public void setUp() throws Exception {

        int[][] paths2x2 = new int[][] {
                {0,0},
                {0,0}
        };
        empty2x2 = new DynamicProgrammingIterative(paths2x2);

        int[][] paths3x3 = new int[][] {
                {0,0,0},
                {0,0,0},
                {0,0,0}
        };
        empty3x3 = new DynamicProgrammingIterative(paths3x3);

    }

    /*
        2 1
        1 x
     */
    @Test
    public void TwoByTwoEmptyPathCount() throws Exception {
        Assert.assertEquals(2, empty2x2.countPaths(1,1));
    }

    /*
    6 3 1
    3 2 1
    1 1 x
    */
    @Test
    public void ThreeByThreeEmptyPathCount() throws Exception {
        Assert.assertEquals(6, empty3x3.countPaths(0,0));
    }

    @Test
    public void IsAtEnd() throws Exception {
        int row = 0;
        int column = 0;
        Assert.assertTrue(empty2x2.isAtEnd(row, column));
    }

    @Test
    public void IsInBounds() throws Exception {
        Assert.assertTrue(empty2x2.isInBounds(0,0));
        Assert.assertTrue(empty2x2.isInBounds(0,1));
        Assert.assertTrue(empty2x2.isInBounds(1,0));
        Assert.assertTrue(empty2x2.isInBounds(1,1));
        Assert.assertFalse(empty2x2.isInBounds(0,2));
        Assert.assertFalse(empty2x2.isInBounds(2,0));
    }

    @Test
    public void IsBlocked() throws Exception {
        int[][] paths = new int[][] {
                {0,0},
                {0,1}
        };
        DynamicProgrammingRecursive brute = new DynamicProgrammingRecursive(paths);
        Assert.assertFalse(brute.isBlocked(0,0));
        Assert.assertFalse(brute.isBlocked(0,1));
        Assert.assertFalse(brute.isBlocked(1,0));
        Assert.assertTrue(brute.isBlocked(1,1));
    }

    @Test
    public void CountBlockedPaths() throws Exception {
        int[][] paths = new int[][] {
                {0,0,0,0,0,0,0,0},
                {0,0,1,0,0,0,1,0},
                {0,0,0,0,1,0,0,0},
                {1,0,1,0,0,1,0,0},
                {0,0,1,0,0,0,0,0},
                {0,0,0,1,1,0,1,0},
                {0,1,0,0,0,1,0,0},
                {0,0,0,0,0,0,0,0}
        };
        DynamicProgrammingRecursive brute = new DynamicProgrammingRecursive(paths);
        Assert.assertEquals(27, brute.countPaths(0,0));
    }

}

We just traverse down and to the right, making sure that we are not out of bounds or in a blocked off square. And when we get to the end, we just return 1, allowing the recursion of the algorithm to walk back up to the top where we started.

The validSquare method does the boundary checking as well as making sure we are not in a blocked off square. If we are blocked return 0. If we are at the end return 1. Else recurse and do the calc for the next down and right squares of the grid.

One Tip for when coding grids

Whenever you do matrix algorithms use row and column for your variable names rather than x and y.

The reason for this is the row and column translates into [y][x] and its a common cause for mistakes. So when dealing with matrices using row and column instead. Just easier.

Memoization

One thing you may have noticed in our previous examples, is that we calculate the number of paths from C to the end twice. That is something we can store for future look ups. And that’s a memoization approach.

DynamicProgrammingMemoized.java

public class DynamicProgrammingMemoized {

    private int grid[][];
    private int paths[][];

    public DynamicProgrammingMemoized(int[][] grid) {
        this.grid = grid;
        this.paths = new int[grid.length][grid[0].length]; // assume square
    }

    public int countPaths(int row, int col) {
        if (!isValidSquare(row, col)) return 0;
        if (isAtEnd(row, col)) return 1;
        if (this.paths[row][col] == 0) {
            this.paths[row][col] = countPaths(row + 1, col) + countPaths(row, col + 1);
        }
        return this.paths[row][col];
    }

    public boolean isValidSquare(int row, int col) {
        return isInBounds(row, col) && !isBlocked(row, col);
    }

    public boolean isBlocked(int row, int col) {
        return this.grid[row][col] == 1;
    }

    public boolean isInBounds(int row, int col) {
        return (row < grid.length && col < grid[0].length);
    }

    public boolean isAtEnd(int row, int col) {
        return grid.length - 1 == row && grid[row].length - 1 == col;
    }
}

DynamicProgrammingMemoizedTest.java

import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;

public class DynamicProgrammingMemoizedTest {

    private DynamicProgrammingMemoized empty2x2;
    private DynamicProgrammingMemoized empty3x3;

    @Before
    public void setUp() throws Exception {

        int[][] paths2x2 = new int[][] {
                {0,0},
                {0,0}
        };
        empty2x2 = new DynamicProgrammingMemoized(paths2x2);

        int[][] paths3x3 = new int[][] {
                {0,0,0},
                {0,0,0},
                {0,0,0}
        };
        empty3x3 = new DynamicProgrammingMemoized(paths3x3);

    }

    /*
    2 1
    1 x
     */
    @Test
    public void TwoByTwoEmptyPathCount() throws Exception {
        Assert.assertEquals(2, empty2x2.countPaths(0,0));
    }

    /*
    6 3 1
    3 2 1
    1 1 x
    */
    @Test
    public void ThreeByThreeEmptyPathCount() throws Exception {
        Assert.assertEquals(6, empty3x3.countPaths(0,0));
    }

    @Test
    public void CountBlockedPaths() throws Exception {
        int[][] paths = new int[][] {
                {0,0,0,0,0,0,0,0},
                {0,0,1,0,0,0,1,0},
                {0,0,0,0,1,0,0,0},
                {1,0,1,0,0,1,0,0},
                {0,0,1,0,0,0,0,0},
                {0,0,0,1,1,0,1,0},
                {0,1,0,0,0,1,0,0},
                {0,0,0,0,0,0,0,0}
        };
        DynamicProgrammingRecursive brute = new DynamicProgrammingRecursive(paths);
        Assert.assertEquals(27, brute.countPaths(0,0));
    }
}

Same idea as the recursive implementation. Only here we store the results as we calculate them which speeds the algorithm and makes the calculations happen a lot faster.

Traditional Dynamic Programming Approach

A more traditional dynamic approach to this problem would be to start at the end and work backwards working our way up.

If we think about what the recursive approach does, the first first concrete values it gets, other than blocked paths, are these two in the bottom right. 1 and 1.

Now, we can we go from there. If we take that recursion one step up, it may be filling one of the next three cells.

If we start on the left, all of these cells have just one path to the bottom right. So all of these values are one.

And if we continue doing this for the other cells, we can continue walking up and filling in values for the other cells as follows for the entire grid.

DynamicProgrammingIterative.java

public class DynamicProgrammingIterative {

    private int grid[][];
    private int paths[][];

    public DynamicProgrammingIterative(int[][] grid) {
        this.grid = grid;
        this.paths = new int[grid.length][grid.length];
    }

    public int countPaths(int row, int col) {
        if (!isValidSquare(row, col)) return 0;
        if (isAtEnd(row, col)) return 1;
        if (isAtBeginning(row, col)) {
            paths[row][col] = 1;
        }
        if (paths[row][col] == 0) {

            int bottomCell = 0;
            int rightCell = 0;

            if (isValidSquare(row + 1, col)) {
                bottomCell = paths[row + 1][col];
            }

            if (isValidSquare(row, col + 1)) {
                rightCell = paths[row][col + 1];
            }

            paths[row][col] = bottomCell + rightCell;
        }
        return countPaths(row - 1, col) + countPaths(row, col - 1);
    }

    public boolean isValidSquare(int row, int col) {
        if (!isInBounds(row, col)) {
            return false;
        }
        if (isBlocked(row, col)) {
            return false;
        }
        return true;
    }

    public boolean isBlocked(int row, int col) {
        return grid[row][col] == 1;
    }

    public boolean isInBounds(int row, int col) {
        if (row < 0 || col < 0) {
            return false;
        }
        return (row < grid.length && col < grid[0].length);
    }

    public boolean isAtEnd(int row, int col) {
        return row == 0 && col == 0;
    }

    public boolean isAtBeginning(int row, int col) {
        return row == grid.length - 1 && col == grid[0].length - 1;
    }

    public void print2x2() {
        System.out.println(paths[0][0] + " " + paths[0][1]);
        System.out.println(paths[1][0] + " " + paths[1][1]);
    }

    public void print3x2() {
        System.out.println(paths[0][0] + " " + paths[0][1] + " " + paths[0][2]);
        System.out.println(paths[1][0] + " " + paths[1][1] + " " + paths[1][2]);
        System.out.println(paths[2][0] + " " + paths[2][1] + " " + paths[2][2]);
    }

    public void print8x8() {
        System.out.println(paths[0][0] + " " + paths[0][1] + " " + paths[0][2] + paths[0][3] + " " + paths[0][4] + " " + paths[0][5] + paths[0][6] + " " + paths[0][7]);
        System.out.println(paths[1][0] + " " + paths[1][1] + " " + paths[1][2] + paths[1][3] + " " + paths[1][4] + " " + paths[1][5] + paths[1][6] + " " + paths[1][7]);
        System.out.println(paths[2][0] + " " + paths[2][1] + " " + paths[2][2] + paths[2][3] + " " + paths[2][4] + " " + paths[2][5] + paths[2][6] + " " + paths[2][7]);
        System.out.println(paths[3][0] + " " + paths[3][1] + " " + paths[3][2] + paths[3][3] + " " + paths[3][4] + " " + paths[3][5] + paths[3][6] + " " + paths[3][7]);
        System.out.println(paths[4][0] + " " + paths[4][1] + " " + paths[4][2] + paths[4][3] + " " + paths[4][4] + " " + paths[4][5] + paths[4][6] + " " + paths[4][7]);
        System.out.println(paths[5][0] + " " + paths[5][1] + " " + paths[5][2] + paths[5][3] + " " + paths[5][4] + " " + paths[5][5] + paths[5][6] + " " + paths[5][7]);
        System.out.println(paths[6][0] + " " + paths[6][1] + " " + paths[6][2] + paths[6][3] + " " + paths[6][4] + " " + paths[6][5] + paths[6][6] + " " + paths[6][7]);
        System.out.println(paths[7][0] + " " + paths[0][1] + " " + paths[7][2] + paths[7][3] + " " + paths[0][4] + " " + paths[7][5] + paths[7][6] + " " + paths[7][7]);
    }
}

DynamicProgrammingIterativeTest.java

import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;

public class DynamicProgrammingIterativeTest {

    /*
    2 1
    1 x
     */
    @Test
    public void TwoByTwoEmptyPathCount() throws Exception {
        int[][] paths2x2 = new int[][] {
                {0,0},
                {0,0}
        };
        DynamicProgrammingIterative empty2x2 = new DynamicProgrammingIterative(paths2x2);
        Assert.assertEquals(2, empty2x2.countPaths(1,1)); // start lower right
    }

    /*
    1 x
    1 1
    */
    @Test
    public void TwoByTwoOneCellBlocked() throws Exception {
        int[][] paths2x2 = new int[][] {
                {0,1},
                {0,0}
        };
        DynamicProgrammingIterative iterative = new DynamicProgrammingIterative(paths2x2);
        Assert.assertEquals(1, iterative.countPaths(1,1));
    }

    /*
    6 3 1
    3 2 1
    1 1 1
    */
    @Test
    public void ThreeByThreeEmpty() throws Exception {
        int[][] paths3x3 = new int[][] {
                {0,0,0},
                {0,0,0},
                {0,0,0}
        };
        DynamicProgrammingIterative empty3x3 = new DynamicProgrammingIterative(paths3x3);
        Assert.assertEquals(6, empty3x3.countPaths(2,2));
    }

    /*
    3 1 0
    2 1 x
    1 1 1
    */
    @Test
    public void ThreeByThreeBlocked() throws Exception {
        int[][] paths3x3 = new int[][] {
                {0,0,0},
                {0,0,1},
                {0,0,0}
        };
        DynamicProgrammingIterative iterative = new DynamicProgrammingIterative(paths3x3);
        Assert.assertEquals(3, iterative.countPaths(2,2));
    }

    /*
    3 1 0
    2 1 x
    1 1 1
    */
    @Test
    public void ThreeByThreeBlocked2() throws Exception {
        int[][] paths3x3 = new int[][] {
                {0,0,0},
                {1,0,0},
                {0,0,0}
        };
        DynamicProgrammingIterative iterative = new DynamicProgrammingIterative(paths3x3);
        Assert.assertEquals(3, iterative.countPaths(2,2));
    }

    /*
    1 1 1
    x x 1
    1 1 1
    */
    @Test
    public void ThreeByThreeBlocked3() throws Exception {
        int[][] paths3x3 = new int[][] {
                {0,0,0},
                {1,1,0},
                {0,0,0}
        };
        DynamicProgrammingIterative iterative = new DynamicProgrammingIterative(paths3x3);
        Assert.assertEquals(1, iterative.countPaths(2,2));
    }

    @Test
    public void CountBlockedPaths() throws Exception {
        int[][] paths = new int[][] {
                {0,0,0,0,0,0,0,0},
                {0,0,1,0,0,0,1,0},
                {0,0,0,0,1,0,0,0},
                {1,0,1,0,0,1,0,0},
                {0,0,1,0,0,0,0,0},
                {0,0,0,1,1,0,1,0},
                {0,1,0,0,0,1,0,0},
                {0,0,0,0,0,0,0,0}
        };
        DynamicProgrammingIterative iterative = new DynamicProgrammingIterative(paths);
        Assert.assertEquals(27, iterative.countPaths(7,7));
    }

}

Now the advantage of doing things this way is that we have used slightly less memory. Runtime is still O(n^2) but we’ve reduced the actual amount of memory used slightly because we no longer have to use the call stack.

So where does this leave us?

The main take away of dynamic programming problems is that when you have a big problem that consists of many similarly oriented smaller ones, you can often implement them using a combination of recursion, with memoization for performance.

Not also that if you look at the tests I wrote, a good place to start with any of these problems is which something really small and simple (like a 2×2 or 3×3 matrix), write tests, and then work your way up from there.

Happy coding!

Links that help

http://www.sanfoundry.com/dynamic-programming-solutions-longest-common-substring-problem/

https://stackoverflow.com/questions/4000169/getting-the-array-length-of-a-2d-array-in-java

https://stackoverflow.com/questions/12231453/syntax-for-creating-a-two-dimensional-array

Advertisements

Memoization

Leave a comment

In computing, memoization is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached results when required.

The most common example of memoization is the implementation of a fibonacci series.

Screen Shot 2017-11-19 at 10.19.56 AM.png

Because a fibonacci series recursively calls upon itself, when calculating the series for any larger numbers, you end up making the same calculations over and over again.

With memoization you store the results of certain calculations along the way, and then simply lookup and return them when called again in the future.

For example, here is a naive implementation of a fibonacci series. One that does the full on fibonacci calculation for every element beginning to end.

FibonacciNaive.java

public class FibonacciNaive {

    public int fib(int n) {
        System.out.println("n = " + n);
        if (n <= 0) {
            return 0;
        } else if (n == 1) {
            return 1;
        } else {
            return fib(n - 1) + fib(n - 2);
        }
    }
}

FibonacciNaive.java

public class FibonacciNaive {

    public int fib(int n) {
        System.out.println("n = " + n);
        if (n <= 0) {
            return 0;
        } else if (n == 1) {
            return 1;
        } else {
            return fib(n - 1) + fib(n - 2);
        }
    }
}

This is an extremely slow implementation. fib(30) here takes about 19 seconds.

By contrast, here is the say series, only memoized. Here we store the result of fib(n) as we calculate it, and then use it each time in subsequent calculations.

Screen Shot 2017-11-19 at 10.29.24 AM.png

FibonacciMemoized.java

public class FibonacciMemoized {

    private int[] memo = new int[1001];

    public int fib(int n) {
        System.out.println("n = " + n);
        if (n <= 0) {
            return 0;
        } else if (n == 1) {
            return 1;
        } else if (memo[n] == 0){
            memo[n] = fib(n - 1) + fib(n - 2);
        }
        return memo[n];
    }
}

This one does fib(30) is less than a second. You have to bump it up to 1000 to approx 20s. So a massive increase in computational performance.

That’s the power and beauty of memoization. Also our runtime is now linear with memoization instead of exponential.

So that’s the power of memoization. Happy coding!

Here is the test harness.

FibonacciTest.java

import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;

import java.util.Date;

public class FibonacciTest {

    private FibonacciNaive naive;
    private FibonacciMemoized memo;

    @Before
    public void setUp() throws Exception {
        naive = new FibonacciNaive();
        memo = new FibonacciMemoized();
    }

    @Test
    public void Naive() throws Exception {
        Assert.assertEquals(0, naive.fib(0));
        Assert.assertEquals(1, naive.fib(1));
        Assert.assertEquals(1, naive.fib(2));
        Assert.assertEquals(2, naive.fib(3));
        Assert.assertEquals(3, naive.fib(4));
        Assert.assertEquals(5, naive.fib(5));
        Assert.assertEquals(8, naive.fib(6));
        Assert.assertEquals(13, naive.fib(7));
        Assert.assertEquals(21, naive.fib(8));
    }

    @Test
    public void Memoized() throws Exception {
        Assert.assertEquals(0, memo.fib(0));
        Assert.assertEquals(1, memo.fib(1));
        Assert.assertEquals(1, memo.fib(2));
        Assert.assertEquals(2, memo.fib(3));
        Assert.assertEquals(3, memo.fib(4));
        Assert.assertEquals(5, memo.fib(5));
        Assert.assertEquals(8, memo.fib(6));
        Assert.assertEquals(13, memo.fib(7));
        Assert.assertEquals(21, memo.fib(8));
    }

    @Test
    public void RecordTimeNaive() throws Exception {
        long startTime = System.currentTimeMillis();
        naive.fib(30);
        long endTime = System.currentTimeMillis();
        long elapsedTime = (endTime - startTime) / 1000;
        System.out.println("elapsedTime = " + elapsedTime); // 19s
    }

    @Test
    public void RecordTimeMemoized() throws Exception {
        long startTime = System.currentTimeMillis();
        memo.fib(1000);
        long endTime = System.currentTimeMillis();
        long elapsedTime = (endTime - startTime) / 1000;
        System.out.println("elapsedTime = " + elapsedTime); // 20s
    }
}

Hash Tables

Leave a comment

A hash table is a key value lookup. It gives you a way of storing a value, against any given key, for very quick lookups.

what-is-hash-table.PNG

The key and value can be basically any kind of data structure. Strings are popular. But so long as the key has a hash function, it could be anything.

How does hashing work

At a high-level, we want to store our objects in a array. But how do we jump from a string, to an actual index in the array? This is what a hash function does.

A hash function takes some string, converts it into an integer, and then that integer gets converted into an index for that array, and hence the object we are looking for.

So the hashcode isn’t the index in the array. We go from the key, to the hashcode, and then to the index.

string-hashcode-index.PNG

And the reason for this is that the array storing the values might be much much smaller than all the possible combinations of hash’s out there.

Collisions

One interesting things about hash codes is that it is possible for multiple different keys to have the same hashcode. There are in infinite number of strings, but only a finite number of hashcodes. Also, because we are further reducing the number of keys into a smaller array, the potential for key collisions when mapping can occur.

collisons.PNG

That’s why a lot of time and effort goes into finding hash algorithms that spread distributions of keys out. In an effort to avoid. But collisions do periodically occur. So we need to deal with them.

The most common way to deal with this collisions is to use chaining. Which basically stores all the collisions for a given key in a linked list. And then when a collisions occurs, walk each element of the linked list until you find the object with the perfect key match. That means the original keys are stored in these chains too.

And for those elements for where there is no collision, we have a one element linked list.

Runtime

Usually we assume that we have a good hash function that gives a good distribution, so for the purposes of an interview you can assume this is constant time. But if you had a bad hash worst case it could be O(n).

hash-table-runtime.PNG

Code

HashTable.java

public class HashTable {

    private int INITIAL_SIZE = 16;
    private HashEntry[] data; // LinkedList

    class HashEntry {
        String key;
        String value;
        HashEntry next;

        HashEntry(String key, String value) {
            this.key = key;
            this.value = value;
            this.next = null;
        }
    }

    HashTable() {
        data = new HashEntry[INITIAL_SIZE];
    }

    void put(String key, String value) {

        // Get the index
        int index = getIndex(key);

        // Create the linked list entry
        HashEntry entry = new HashEntry(key, value);

        // If no entry there - add it
        if (data[index] == null) {
            data[index] = entry;
        }
        // Else handle collision by adding to end of linked list
        else {
            HashEntry temp = data[index];
            while(temp.next != null) {
                temp = temp.next;
            }
            temp.next = entry;
        }

        // the above while loop walks down the chain to the end
        // and then assigns the new entry as the last element
    }

    String get(String key) {

        // Get the index
        int index = getIndex(key);

        // Get the value
        HashEntry temp = data[index];

        // if value
        if (temp != null) {
            // else walk chain until find match
            while (!temp.key.equals(key) && temp.next !=null) {
                temp = temp.next;
            }

            return temp.value;
        }

        // else return null
       return null;
    }

    private int getIndex(String key) {
        // Get the hash code
        int hashCode = key.hashCode();

        // Convert to index
        int index = hashCode % INITIAL_SIZE;

        // Hack to force collision for testing
        if (key.equals("PeterCollision") || key.equals("PaulCollision")) {
            index = 4;
        }

        return index;
    }

    @Override
    public String toString() {
        int bucket = 0;
        StringBuilder hashTableStr = new StringBuilder();
        for (HashEntry entry : data) {
            if(entry == null) {
                continue;
            }
            hashTableStr.append("\n bucket[")
                    .append(bucket)
                    .append("] = ")
                    .append(entry.toString());
            bucket++;
            HashEntry temp = entry.next;
            while(temp != null) {
                hashTableStr.append(" -> ");
                hashTableStr.append(temp.toString());
                temp = temp.next;
            }
        }
        return hashTableStr.toString();
    }
}

HashTableTest.java

import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;

public class HashTableTest {

    private HashTable hashTable;

    @Before
    public void setUp() throws Exception {
        hashTable = new HashTable();
    }

    @Test
    public void PutAndGet() {
        hashTable.put("Peter", "PeterValue");
        Assert.assertEquals("PeterValue", hashTable.get("Peter"));
    }

    @Test
    public void Collision() {
        // these keys will collide (hacked in solution for testing)
        hashTable.put("PeterCollision", "PeterValue");
        hashTable.put("PaulCollision", "PaulValue");

        Assert.assertEquals("PeterValue", hashTable.get("PeterCollision"));
        Assert.assertEquals("PaulValue", hashTable.get("PaulCollision"));

        System.out.println(hashTable.toString());
    }
}

Links that help

http://www.dreamincode.net/forums/topic/273783-the-use-of-the-modulo-operator/
http://www.java67.com/2014/11/modulo-or-remainder-operator-in-java.html


Stacks and Queues

Leave a comment

Stacks and Queues are pretty similar data structures. The difference is in how they remove elements.

A Stack is LIFO – Last In First Out. Much like a stack of books.
A Queue is FIFI – First In First Out. If you are the first to line up, you are the first to come out.

While a Queue is more like when you are waiting at a super market. FILO – First in Last Out.

The are both linear data structures.
They can be implemented as Arrays or Linked Lists.

Here is a Stack and Queue implemented as a Linked List.

Stack.java

public class Stack {

    private class Node {
        private int data;
        private Node next;
        private Node (int data) {
            this.data = data;
        }
    }

    private Node top; // add and remove things here

    public boolean isEmpty() {
        return top == null;
    }

    public int peek() {
        return top.data;
    }

    public void push(int data) {
        // Create new node
        // Set it's next to be top
        // Set top to be the new node

        Node node = new Node(data);
        node.next = top;
        top = node;
    }

    public int pop() {
        // Store the value you want to return
        // Set the current top.next to be the new top
        // return the value

        // without null checks
        int data = top.data;
        top = top.next;
        return data;

        // with null check
//        int data = top.data;
//
//        if (top.next != null) {
//            top = top.next;
//        } else  {
//            top = null;
//        }
//        return data;
    }
}

Queue.java

public class Queue {

    private class Node {
        private int data;
        private Node next;
        private Node (int data) {
            this.data = data;
        }
    }

    private Node head; // remove things here
    private Node tail; // add things here

    public boolean isEmpty() {
        return head == null;
    }

    public int peek() {
        return head.data;
    }

    public void add(int data) {
        // Create a new node
        // Set the current tail.next to point to this new node
        // Then set the new node to be the new tail

        Node newTail = new Node(data);
        if (tail != null) {
            tail.next = newTail;
        }
        tail = newTail;

        // handle case of first element where head is null
        if (head == null) {
            head = tail;
        }
    }

    public int remove() {
        // Point the head at the current head next
        int data = head.data;
        head = head.next;

        // Handle queue now being empty
        if (head == null) {
            tail = null;
        }

        return data;
    }
}

Binary Search Trees

Leave a comment

Some notes from this excellent video by Gayle McDowell on trees.

What is a binary tree?

A binary tree is a data structure where by each node has no more than two child nodes.

what-is-binary-tree.PNG

What is a binary search tree?

A binary search tree is a binary tree sorted in a specific ordering property. On any subtree, the left nodes are less than the root node, which is less than all of the right nodes.

This ordering makes finding a node really fast because we have pretty good idea of which way to go as we search the tree, going left or right and essentially halving our search area every step of the way.

what -is-binary-serach-tree.PNG

Inserts

Insert works much like a find. We start at the top, do a comparison, and then head down to the child subtree and repeat the process again until we get down to an empty spot or null element.

insert-1.PNG

Note the one problem with this approach is that if we get elements in a particular order we can get really unbalanced.

unbalanced.PNG

And this will make our find ands inserts slow.

There are algorithms that you can use to keep a binary tree balanced, but they are pretty complicated, and in an interview you can generally assume that you are going to have a balanced tree.

Traversinge

There are three common ways we walk through a tree.

Preorder traversal: Means you visit the root of the tree first, and then you visit it’s left nodes and it’s right nodes. Top-bottom, left to right.

Inorder traversal: Left nodes first, then right. So left, parent, then right.

Post order: Left, then right, then root.

order-traversal.PNG

Typically in binary search trees we walk to do inorder traversal because that actually allows the nodes to be printed out in order.

inorder-preferred.PNG

Code for Binary Search Tree

NodeBST.java

public class NodeBST {

    NodeBST left, right;
    int data;

    public NodeBST(int data) {
        this.data = data;
    }

    public void insert(int value) {
        // look to the left and the right to see where we want to insert
        if (value <= data) {
            if (left == null) {
                left = new NodeBST(value);
            } else {
                // push down to child and ask it to handle : recursion
                left.insert(value);
            }
        } else {
            if (right == null) {
                right = new NodeBST(value);
            } else {
                right.insert(value);
            }
        }
    }

    public boolean contains(int value) {
        // if we are there, return true
        if (value == data) {
            return true;
        } else if (value < data) {
            // then it should be on the left
            // if there is no left node
            if (left == null) {
                return false;
            } else {
                return left.contains(value);
            }
        } else {
            if (right == null) {
                return false;
            } else {
                return right.contains(value);
            }
        }
    }

    // left child -> parent -> right child
    public void printInOrder() {
        if (left != null) {
            left.printInOrder();
        }

        System.out.println("data = " + data);

        if (right != null) {
            right.printInOrder();
        }
    }
}

NodeBSTTest.java

import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;

public class NodeBSTTest {

    private NodeBST node;

    @Before
    public void setUp() throws Exception {
        node = new NodeBST(10);
        node.insert(5);
        node.insert(15);
        node.insert(8);
    }

    @Test
    public void Contains() throws Exception {
        Assert.assertTrue(node.contains(5));
        Assert.assertTrue(node.contains(15));
        Assert.assertTrue(node.contains(8));
    }
    
    @Test
    public void PrintOrder() throws Exception {
        node.printInOrder();
    }
}

print.PNG

Binary Heaps

Leave a comment

A binary heap is a binary tree heap data structure used in sorting and priority queue algorithms.

What is a binary tree?

Screen Shot 2017-11-10 at 9.05.53 AM.png

A binary tree is a data structure that stores it’s data in the shape of a tree. At the top you have a root node, and underneath that you have at maximum two other nodes. One on the left and one on the right.

Binary trees are useful for all sorts of things. But one area where they really shine is in searching. When you sort a binary tree in certain ways, you can find elements much faster than if you were walking an array or linked list.

What are heaps?

A heap data structure is binary tree that is sorted. We call it a max heap (if the largest number is on the top). And a min heap if the largest numbers are on the bottom.

Screen Shot 2017-11-10 at 9.07.25 AM.png

What’s cool about max heaps is that their structure lends itself very nicely to quickly finding maximum level elements in a queue. Which is why priority queues are often implemented using binary tree structures sorted as heaps.

What is a binary heap?

As you’ve probably guess, binary heaps combine binary trees with heap sorted structures.

Screen Shot 2017-11-10 at 9.07.53 AM.png

In binary heaps any node has at least the value of the nodes children. There is no left/right orientation or distinction in values.

Also children further down the tree can be of greater value than nodes higher up the tree on the other side. That doesn’t matter. The only thing that matters is that the children at equal or less than their parents. That’s it.

Which means every node, it the root of it’s own sub-heap.

Heap representation

Because the shape of a heap is represented so nicely it is easy to store heap nodes in an array.

Screen Shot 2017-11-10 at 9.08.11 AM.png

Top to bottom, left to right. Also, because we know where every element of the array lies, we can access its elements very easily using the above formula.

Finding the maximum

Finding the maximum is easy. It’s simply the first, or highest element at the top of our heap. So it is always the first element in our array.

Inserting

When inserting we stick the new element at the end of our tree (always reading top to bottom, left to right) and stick it in the last element of our array. And if is smaller than the parent we are done.

But what happens if our element is larger that the parent?

Screen Shot 2017-11-10 at 9.08.26 AM.png

For insertions like this we walk up the tree, swapping nodes until our heap constraint (of the child being of less than or equal value to our parent) is satisfied.

Screen Shot 2017-11-10 at 9.08.44 AM.png

And if we insert a number each to it’s parent, we just leave it there. It’s done.

Deletion

Say we want to delete the root of our heap (i.e. we pulled off it’s max value and now we need the heap to resort itself – or heapify).

Screen Shot 2017-11-10 at 9.08.56 AM.png

To delete the max element, we take the max node, swap it with the last leaf, and then delete the last leaf which we know is so easy.

Screen Shot 2017-11-10 at 9.09.16 AM.png

Now we need to resort the heap (because that top element isn’t quite right as it doesn’t represent the max).

We do this by comparing the top with with each left/right subnode, and continuously swapping with the larger of the two.

Screen Shot 2017-11-10 at 9.09.46 AM.png

In this case the left child is larger, so we are going to swap the root to the left.

Screen Shot 2017-11-10 at 9.09.57 AM.png

No we repeat the process with the left hand subtree. We keep swapping the node in question with the largest of it’s children until we can’t swap anymore.

Screen Shot 2017-11-10 at 9.10.13 AM.png

At this point we say the heap is heapified, and finally sorted. It is good to go for another extraction.

Cool things

Some other cool things about binary trees is that this data structure can sort itself very efficiently. i.e. It can sort itself in place. No need to copy all the contents somewhere else and then copy back. When we heapify it does it all with the existing array and simply swapping nodes. Very cool. Very efficient.

Code

Here is an example of a Max Heap implementation for ints in Java.

MaxIntHeap.java

import java.util.Arrays;

public class MaxIntHeap {
    private int capactity = 10;
    private int size = 0;

    int[] items = new int[capactity];

    private int leftChildIndex(int parentIndex) { return 2 * parentIndex + 1; }
    private int rightChildIndex(int parentIndex) { return 2 * parentIndex + 2; }
    private int parentIndex(int childIndex ) { return (childIndex - 1) / 2; }

    private boolean hasLeftChild(int index) { return leftChildIndex(index) < size; }
    private boolean hasRightChild(int index) { return rightChildIndex(index) < size; }
    private boolean hasParent(int index) { return parentIndex(index) >= 0; }

    private int leftChild(int index) { return items[leftChildIndex(index)]; }
    private int rightChild(int index) { return items[rightChildIndex(index)]; }
    private int parent(int index) { return items[parentIndex(index)]; }

    private void swap(int indexOne, int indexTwo) {
        int temp = items[indexOne];
        items[indexOne] = items[indexTwo];
        items[indexTwo] = temp;
    }

    private void ensureCapactity() {
        if (size == capactity) {
            items = Arrays.copyOf(items, capactity * 2);
            capactity *= 2;
        }
    }


    public int extractMax() {
        if (size == 0) throw new IllegalStateException();
        int item = items[0];        // grab the max
        items[0] = items[size - 1]; // copy the bottom to the top
        size--;
        heapifyDown();             // and no because top isn't right, we heapify down
        return item;
    }

    public void add(int item) {
        ensureCapactity();
        items[size] = item;          // put in last spot
        size++;
        heapifyUp();
    }

    public void heapifyUp() {
        int index = size - 1;       // start at last element
        while (hasParent(index) && parent(index) > items[index]) {  // walk up as long as there is a parent and it is bigger than you
            swap(parentIndex(index), index);
            index = parentIndex(index); // walk upwards to next node
        }
    }

    public void heapifyDown() {
        int index = 0;              // starting at the top
        while (hasLeftChild(index)) {  // as long as I have children Note: Only need to check left because if no left, there is no right

            // pick a direction, and get the smaller of the two indexes
            int smallerChildIndex = leftChildIndex(index);
            if (hasRightChild(index) && rightChild(index) < leftChild(index)) {
                // swap right (because we are min heap
                smallerChildIndex = rightChildIndex(index);
            }

            // Now compare

            // if I am smaller than the items of my two children...then everything is good. I am sorted.
            if(items[index] < items[smallerChildIndex]) {
                break;
            } else { //  we are still not in order
                swap(index, smallerChildIndex);         // so swap with the smaller child
            }

            index = smallerChildIndex;              // then move down to smaller child
        }
    }

    public void print() {
        for (int i=0; i < size; i++) {
            System.out.println(i + "[" + items[i] + "]" );
        }
    }
}

MaxIntHeapTest.java

import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;

public class MaxIntHeapTest {

    private MaxIntHeap minHeap;

    @Before
    public void setUp() throws Exception {
        minHeap = new MaxIntHeap();
        minHeap.add(6);
        minHeap.add(5);
        minHeap.add(4);
        minHeap.add(3);
        minHeap.add(2);
        minHeap.add(1);
    }

    @Test
    public void Insert() throws Exception {
        // Remember: The array walks top down / left to right
        Assert.assertEquals(1, minHeap.items[0]);
        Assert.assertEquals(3, minHeap.items[1]);
        Assert.assertEquals(2, minHeap.items[2]);
        Assert.assertEquals(6, minHeap.items[3]);
        Assert.assertEquals(4, minHeap.items[4]);
        Assert.assertEquals(5, minHeap.items[5]);
    }

    @Test
    public void ExtractMin() throws Exception {
        Assert.assertEquals(1, minHeap.extractMax());
        Assert.assertEquals(2, minHeap.extractMax());
        Assert.assertEquals(3, minHeap.extractMax());
        Assert.assertEquals(4, minHeap.extractMax());
        Assert.assertEquals(5, minHeap.extractMax());
        Assert.assertEquals(6, minHeap.extractMax());
    }
}

Links that help

https://cs.stackexchange.com/questions/27860/whats-the-difference-between-a-binary-search-tree-and-a-binary-heap
http://www.geeksforgeeks.org/binary-heap/

Priority Queues

Leave a comment

A priority queue is like a regular queue only it has one extra method

ExtractMax()

This method returns the highest priority item in the queue, and it’s useful for things like sorting when used in conjunction with binary trees.

A naive approach

Now a naive approach to implementing a priority queue would be to use an array and then walk the entire array, every time, looking for the highest priority element.

unsorted.PNG

Things can improve if you sort the array, then ExtractMax() is easy as it is simply the last element

sorted-array.PNG

And of course you could do this using a list also, through you would lose the ability to quickly find the max position as you can’t do binary search with a linked list.

sorted-list.PNG

So sorted Arrays and LinkedLists are two ways you could implement a priority queue. But if you really want to go fast, what you really need is a binary heap.

binary-heap-fast.PNG

These notes were taken from the excellent Coursera data structures course.

Older Entries

%d bloggers like this: