melreams.com

Nerrrrd

What does reasoning about code even mean?

Unrelated image from pexels.com to make this post look nicer in social media shares.

Programmers talk about “reasoning about code” all the time, but what does that actually mean? Is it really just pretending to be a CPU and working out exactly when your for loop ends?

Yes and no. Figuring out exactly what your loops are up to is part of it, but at a higher level it’s also about how your methods, objects, modules, functions all work together. Like this stackoverflow answer says, writing code isn’t just about cranking out something that compiles, you’ve also got to be able to figure out if it’s actually right, if it performs well enough, if it can be scaled, if it’s vulnerable to bad data (whether that’s malicious or accidental).

Sadly, tools can only help you so much with everything that comes after getting your code to compile. The compiler can tell you whether your code will run, but it can’t tell you if it does what you meant. Automated tests are a huge help with that, but those tests are code themselves and you also need to be able to reason about them to make sure you’re testing what you meant to test and that the test itself isn’t broken.

To figure out what your code is doing, you need to be able to read and understand it well enough that you can predict what it will do for any given input. That’s basically all “reasoning about code” is. If you can reason about your code, you can change it or add more features without breaking things or spending hours and hours cursing at your computer and howling “whyyyyy won’t this work?!”

So being able to reason about your code is really useful, but how do you do it? Practice, mostly. If you’re a beginner programmer you should not worry even a little bit about understanding a whole codebase or heck, even a whole class. Focus on one method or one loop or one if statement at a time. When you get enough practice, you’ll be able to understand those really quickly and you can move up a level to figuring out what a whole method does with different inputs or how two methods in the same class work together.

Oh, and it really helps if the code you’re trying to reason about is good code (that is, well organized, has good names, etc). Some code is next to impossible to reason about because it’s full of giant methods or the variable names aren’t helpful at all or, and this is one of the worst problems, variables change all the time in unpredictable ways. A “total” variable that changes value each time through a loop isn’t so bad, it’s clear what it’s for and why it changes. On the other hand a “total” variable that gets reused for completely different totals at different places in a method is a real problem. Humans can only hold so many things in their working memory at once and “wait where does total change again?” takes up most of them. If you have a terrible time reasoning about a certain piece of code, it’s entirely possible the problem is not you but the code.

In cases like that when I don’t have time to refactor the code so it’s easier to read I find it really helpful to make a lot of notes and draw myself a map through the code. If you want to be really helpful you can even type up your notes / make a diagram out of your map and add it to your developer documentation.

Reasoning about code can be hard to do and takes practice, but it’s not some sort of magic that only “real” programmers can do.

How does a hash function work anyway?

A while ago I wrote about how hash maps work, but something’s been bugging me. How does the hash function do its thing? I know hash functions make variable length data into fixed length data but how do they do that? To be clear I’m interested in the kind of hash you would use for a hash map, you would definitely want a more secure hash to keep your passwords safe.

Thanks to the magic of the internets, it’s really easy to find the function java uses to calculate a String’s hashcode.

/* Returns a hash code for this string. The hash code for a String 
object is computed as s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the ith character of the string, 
n is the length of the string, and ^ indicates exponentiation. 
(The hash value of the empty string is zero.)
Returns: a hash code value for this object. */

public int hashCode() {
  int h = hash;
  if (h == 0) {
    int off = offset;
    char val[] = value;
    int len = count;

    for (int i = 0; i < len; i++) {
      h = 31*h + val[off++];
    }
    hash = h;
  }
  return h;
}

Okay great, that’s totally clear, right? ;)

Yeah, I have no idea what it’s actually doing either. But I can find out!

First of all, where are the values of hash, offset, and count coming from? They must be instance variables because they weren’t passed in as parameters. I poked around in the String code a little more and it turns out hash is defaulted to 0 when it’s declared, offset is set to 0 in the constructor, and count is set to the size of the string when it’s created.

Unrelated image from pexels.com to make this post look nicer in social media shares.

The first thing hashCode actually does is checks if hash is 0. If it’s not, then we know we already computed the hash and we can just return it and go on with our day. Makes sense, why do the same calculation over and over again when we can just do it once and store the result? I think that’s the same reason count is stored separately instead of just calling value.length() when you need it. We know the length will never change because Strings are immutable, so why not save ourselves a lookup?

The next weird thing is how the method is adding a number to a char. Chars are characters, not numbers, aren’t they? Well, yes and no. According to the docs, a char is “a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).” That 0 to 65,535 part seems suspiciously like a number :) You can also test that out yourself in the Java REPL. It turns out Java will happily treat a char like an int if you ask it to.

The rest of it is pretty simple, we’re just looping through every character in the string and adding (31 * current hashcode) + current character to the existing hashcode.

Okay, but how does that map a string of any length to a hash code of fixed length? Shouldn’t a longer String always have a larger hash code? Not if your hashcode is an integer! Those just roll over into negative numbers if you add too large of a number to them. And because 2 and -821785444 are both integers they take up the same amount of memory, which means that no matter what size String you start with, the hashcode is always the same size.

Another interesting little detail of how hashmaps actually use those hashcodes is that they rehash your hashes. If everyone used random Strings for keys then they wouldn’t need to, but because keys are usually Strings with some kind of meaning, that means the hashes for those keys won’t be evenly distributed. That is, a hashcode doesn’t have an equal chance of being any number from -231 to 231-1, you’re going to get clumps of hashes around some numbers because you’re more likely to use some Strings than others.

Great, but why does that matter? Performance! The more collisions you have (different Strings that happened to work out to the same hashCode), the more elements you need to look at to find the one you wanted and the worse your performance is. To get around that, java does some bitwise operations on the hashcode to reduce the number of collisions.

Now we all have some idea what actually happens when you use a HashMap :)

How does binary search work, anyway?

Unrelated photo by Matthew Weibe to make this post look nice in social media shares.
Unrelated photo by Matthew Weibe to make this post look nice in social media shares.

Or, yet another blog post idea I stole from that article about programming interview concepts. You can find the rest of those posts under the how does it work? tag.

Binary search is an extremely simple idea that’s useful for much more than finding an element in a sorted array. The way binary search works is you compare the item you want to find with the item in the middle of the array, then whichever “side” of the midpoint of the array your element falls on, you compare it to the midpoint of that half until you find the item you want.

According to wikipedia a binary search makes at worst O(log2n) comparisons, which is pretty great when you have a large array. Because the search halves the search space each iteration, the maximum number of iterations you’ll need is the number of times you can divide your array in half. For an array of 100 items, you should only need 7 searches, and for an array of 200 items, you only need 8, and for 1000 items, only 10 searches. See how slowly the number of searches grows as the array gets a lot larger? That’s just cool!

Also cool: you can use that concept for more than just finding stuff in arrays. Back in college they taught us to narrow down where a bug in your code is using a binary search. Basically, comment out half of your code and see if the bug still happens. If it doesn’t, uncomment half of the commented half. If it does, comment out another half of the uncommented half. It always felt weird to do that, like I should’ve just been able to see the problem by reading the code, but it worked. If you’re new or just totally stumped, give it a try.

Of course, binary search isn’t the only way to find things. Hash maps can be even faster, but all they can tell you is whether your target item exists or not. If you want to return the next largest or smallest item in the event that you don’t find an exact match, hash maps are no help at all. There are binary search trees too (which I’ll go into more detail on in a future post), but ironically even though they’re named binary search trees, binary searching an array is usually faster. The problem with binary trees is it’s hard to keep them perfectly balanced, so you might have more “halves” on one side than the other which messes with your search efficiency. What binary trees are good for is quick updates – it can be a real hassle to add or remove an item from an array, trees are much easier to work with. There’s also plain old linear search, where you start at one end of your array and look at every item until you find the one you want. If you’re not going to search your array enough times for it to be worth sorting it, linear search is good enough.

That kind of tradeoff can actually make algorithm questions interesting. I still don’t care even a little bit whether you looked up how to reverse a binary tree before the interview, but I care a lot if you think to ask if you’re going to search that array enough times to recoup the cost of sorting it. Programmers, myself included, can be terrible about overengineering to solve problems that don’t actually exist. But more on that in another post!

How does big O notation work, anyway?

Unrelated image from pexels.com to make this post look nicer in social media shares.
Unrelated image from pexels.com to make this post look nicer in social media shares.

You didn’t think I was going to stop beating that programming interview concepts post into the ground, did you? Unlike some of the other concepts, big O notation is something you actually need to understand to write good code. Even better, you don’t need to know the term big O notation to have a conversation about the underlying concept.

Big O notation, also known as the order of a function, is a way of describing how much worse a function performs when you give it more input. If you’re sorting a small enough list, for example, it really doesn’t matter how efficient your algorithm is. If your list is a million items long, then the efficiency of your algorithm suddenly matters a whole lot.

The specifics aren’t necessarily that interesting, but let’s run through them quickly. Some functions, like figuring out whether a number is even or odd, are constant, or O(1). No matter how big a number you give that function, it’s going to take about the same amount of time to figure out whether it’s even. Functions that take longer in direct proportion to the size of the input, like finding an item in an unsorted list are called O(n) – the order is the same as the size of the input. Inefficient sort algorithms like insertion sort or bubble sort get much worse, O(n²). A bad sort has to compare each item in the list to every item in the list, which means that performance gets worse much faster than the input size increases. An extremely inefficient algorithm could even be O(n!), for example brute-forcing the travelling salesman problem.

In other words, nested loops are really, really bad :)

Understanding just how much worse an inefficient algorithm performs is a really important part of being able to make reasonable tradeoffs in your code. If your input is large enough, creating a binary tree out of it may be worth the extra memory usage for the improvement in time to find any given item. Assuming the computer will just magically handle it is not good enough when you may have to process hundreds of thousands of items or more.

As an interview question, I actually like big O notation – as long as you don’t insist on using those exact words to describe it. There’s no reason you need to use comp sci jargon to talk about which function of two examples would perform better with a massive amount of data to work on. It’s also something you can have an actual conversation about. Sometimes an inefficient algorithm is your only option, sometimes it can be avoided by using a different data structure, sometimes you can avoid the issue entirely by finding a different way to meet the user’s needs. That much more interesting to talk about than whether your candidate thought to look up how a breadth first search works :)

How does quicksort work, anyway?

Unrelated image from pexels.com to make this post look nicer in social media shares.
Unrelated image from pexels.com to make this post look nicer in social media shares.

Why yes, I am going to keep mining that article about stuff you should know for programming interviews for blog post ideas :) While I don’t think that a lot of the common interview concepts from that article are actually worthwhile to ask about in an interview, I do think they’re interesting bits of nerd trivia and going in depth into how stuff works shows that nothing the computer does is magic.

Sort algorithms in particular are a weird interview question because you should basically never implement one at work. There are always edge cases, but in general if you actually write a sort function you have done something bad and you should feel bad. The correct way to implement a sort function is to import a library and go on with your day.

That said, sort algorithms are interesting in their own right. They’re one of those things that seem incredibly simple and boring until you start thinking about how you would tell a computer how to sort things. There are also way more sort algorithms than you might think, all with their own pros and cons.

Quick sort uses a divide and conquer strategy – instead of sorting the entire array you give it, it picks a pivot point (different implementations do this in different ways, one of the simplest methods is just to take the middle element of your array), rearrange the elements of your array so that everything less than the pivot is on the left and everything greater is on the right. Then you break the array into halves and recursively search each one until everything in the array is in order. There’s a really helpful gif at the top of the wikipedia article about quicksort that explains it better.

Because quick sort rearranges the array elements by swapping them, it requires very little memory, which was a big deal when it was invented by Tony Hoare in 1959. To this day it’s one of the fastest sorting algorithms, provided you do a good job of picking your pivot point. If you do a bad job of that things go off the rails, particularly if your array is mostly sorted already. In that case quick sort can (if you don’t check for a sorted or mostly sorted array) effectively unsort and resort your array which is pretty slow, surprisingly enough.

Another efficient (in this case it’s a technical term for sort algorithms that are efficient enough to actually use) sort algorithm is merge sort. Merge sort is even older than quick sort, it was invented in 1945 by John von Neumann. Like quick sort, it uses a divide and conquer strategy, the difference is that merge sort divides the array into the smallest pieces it can, then merges those pieces into two element arrays, then merges those into four element arrays and so on until it produces a completely sorted array. As usual, wikipedia has a gif that explains it visually.

Merge sort requires much more memory than quick sort does because of the way it creates new arrays while it’s sorting. This can be an issue if you’re sorting especially large arrays, although I’m sure more advanced algorithms based on merge sort can do some sort of trickery to mitigate that :) On the upside, it’s a stable sort – if you have two objects in the array with the same sort order, they’ll stay in that order – unlike quick sort. It’s also good at handling slow sequential media like tape drives and handling linked lists, which quick sort is slow at and heap sort can’t handle at all.

Heap sort, the last sort algorithm I want to talk about today, is an interesting one. Unlike quick sort and merge sort, heap sort puts all the elements of the array into a heap first, then uses that to sort the array.

Quick digression from sorting: a heap is a partially ordered tree structure. In a heap, the child nodes are always less than (in a min heap, or always larger in a max heap) the parent node, but siblings aren’t in any particular order relative to each other. The root node is always the largest or smallest element in the heap, and if you remove it the heap rebalances itself so the next largest or smallest element becomes the new root.

Back to heap sort: once you have a heap it’s very simple, you just take the root, add it to your array, let the heap rebalance itself, take the new root, and so on until your heap is empty.

In comparison with other sorts, heap sort is a little slower than quick sort on average but has better worst case performance. Merge sort has similar time bounds (average, best case, and worst case time it takes to sort an array), but takes up more memory because a heap sort can be done in place. On the other hand, merge sort is stabl, parallelizes well, and works on datasets too large to fit into memory at once, which neither quick sort not heap sort can do.

One last piece of trivia: the Timsort algorithm, implemented in 2002 by Tim Peters, is based on merge sort and insertion sort (a very simple sort algorithm) and is the standard sort function in Python and Java.

There’s a huge amount of detail I skipped over, I recommend poking around wikipedia if you’re interested in more detail about the many, many, many ways you can sort a list. Just don’t ask about them in interviews, because all you’ll learn by doing that is whether your interviewee looked them up beforehand :)

How does a breadth-first search work, anyway?

In a recent post I mentioned having read an article about passing programming interviews that said it was important to be able to write a breadth-first search and to understand how hash maps work. I covered hash maps last time, so this time let’s talk about breadth-first searches.

The first question is what on earth is a breadth first search? It’s a way of searching a tree structure. in a breadth-first search, you look at all the nodes at a particular ‘level’ of the tree before looking at anything in the next level. Another way you can do it is depth-first, where you follow each node’s children down and down until you run out of children, then go back up to the next child node you haven’t already visited and follow it’s children down until you run out again, and so on until you’ve visited all the nodes in the tree.

This is definitely a case where a picture is worth 1000 words. Here’s the order you visit nodes in a breadth-first search:

By Alexander Drichel - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=3786735
Order nodes are visited in a breadth-first search. By Alexander DrichelOwn work, CC BY 3.0

and here’s the order in a depth first search:

Order nodes are visited in a depth first search. By Alexander Drichel - Own work, CC BY-SA 3.0
Order nodes are visited in a depth-first search. By Alexander DrichelOwn work, CC BY-SA 3.0

Great, what’s a breadth-first search for? According to wikipedia it’s good for a bunch of problems in graph theory that I totally don’t understand, and some more understandable stuff like finding the shortest path between two nodes in a tree and serializing a binary tree in such a way that you can easily deserialize it.

So how do you do a breadth-first search anyway?

bijulsoni has graciously provided an example in their article Introduction to Graph with Breadth First Search(BFS) and Depth First Search(DFS) Traversal Implemented in JAVA on Code Project. If you’re interested, that code is provided under The Code Project Open License (CPOL) 1.02, which basically states that you can do whatever you like with the code but don’t come crying to them if it doesn’t work.

Here’s a breadth-first search:

 
public void breadthFirstSearch() { 
    //BFS uses Queue data structure 
    Queue q=new LinkedList(); 
    q.add(this.rootNode); 
    printNode(this.rootNode); 
    rootNode.visited=true; 
    while(!q.isEmpty()) { 
        Node n=(Node)q.remove(); 
        Node child=null; 
        while((child=getUnvisitedChildNode(n))!=null) { 
            child.visited=true; 
            printNode(child); 
            q.add(child); 
        } 
    } 
    //Clear visited property of nodes 
    clearNodes(); 
} 

and to compare, here’s how a depth-first search works:

public void depthFirstSearch() {
    //DFS uses Stack data structure
    Stack s=new Stack();
    s.push(this.rootNode);
    rootNode.visited=true;
    printNode(rootNode);
    while(!s.isEmpty()) {
        Node n=(Node)s.peek();
        Node child=getUnvisitedChildNode(n);
        if(child!=null) {
            child.visited=true;
            printNode(child);
            s.push(child);
        } else {
            s.pop();
        }
    }
    //Clear visited property of nodes
    clearNodes();
}

The complete, runnable code can be downloaded from the article linked above if you’d like to run it yourself. getUnvisitedChildNode() does what you would expect, so I left it out to save space. What I find really interested about the breadth-first and depth-first algorithms is that they’re practically identical except for the different data structures used to hold the nodes we’re working on. The simple change from a queue (where you add items to the end and remove items from the head) to a stack (where you both add and remove items from the end) is all it takes to change a breadth-first search to a depth-first search.

Now we all know how a breadth-first search (and a depth-first search as a bonus) works. You can safely forget the details, secure in the knowledge you can look it up if you need to :)

How does a hash map work, anyway?

I was reading this article about programming interviews a little while ago and one of the things they mentioned was that “A startlingly high percentage of interview questions reduce to breadth-first search or the use of a hash table to count uniques. You need to be able to write a BFS cold, and you need to understand how a hash table is implemented.” I saw that and started thinking wait a minute, how is a hash table implemented? I’m sure we covered that in college, but that was quite a while ago and I’ve forgotten a lot of stuff.

Unrelated photo from pexels.com
Unrelated photo from pexels.com

Quick note: Hash tables are also known as hash maps, dictionaries, associative arrays and probably more. I usually call them hash maps because that’s what they’re called in java.

For data structures, hash maps are really kind of cool. Under the hood, the data is actually stored in an array of linked lists. The interesting part is how the hash map decides where to put new elements. In a hash map, each array index is a bucket to put elements in, not the location of a single element. It uses a hash function (according to wikipedia, “any function that can be used to map data of arbitrary size to data of fixed size”, basically you give it an object and it gives you a number in a certain range) to decide which bucket to put any given item in. Hash % array length = array index to add item to. If there’s already an item in that bucket, it walks down the linked list to the last item and adds the new item to it. Because there can be multiple items in each “bucket,” the hash map creates an entry element that contains both the key and the value so it can find the right value for a given key when you ask for an element back.

Dividing the whole collection of keys and values into buckets is how hash maps work so quickly. If you had to search the whole list of keys each time it would much longer to get an element back from the hash map, but because of the bucketing system you only have to look at the elements in one bucket. It’s also very fast to add elements because you don’t need to worry about resorting the list, you just find the right bucket and add your new element to the end of the list.

There is a complication, though: the more items in each bucket the longer it takes to find an item. If your “buckets” fill up, you’ll need to expand the hash map by making a longer array and recalculating which bucket to put each and every item into. That can be a pretty bad performance hit if you have enough items in your hash map already. You also don’t necessarily want to avoid that problem by using a very large array, that just eats up memory for no good reason if you have so many buckets that you never end up putting anything in most of them.

Because of the bucketing system, hash maps are a great way to count uniques – being able to quickly find the right bucket and look through only a few items means you can add items or see whether the hash map already contains the item you want to add. On the other hand, hash maps aren’t very useful if you care about the order of your items or if you’re just going to process all of them in a loop. Getting a list of every element in a hash map involves walking each linked list in each bucket, which can take some time (and memory!) if you have enough items. If you’re going to process everything in your collection, skip the overhead of a hash map and just use an array list or linked list.

Okay, so given all of those implementation details, why are hash maps interesting to ask about in an interview? While it’s generally a good idea to understand data structures because they’re so core to what we do as programmers, I’m suspicious they get asked about in interviews because those interviewers were asked about them. Unless you’re interviewing a recent grad and want to make sure they paid attention in class, I’m not convinced that you really learn anything interesting about someone’s ability to code by asking for more details than what a hash map is good for and when you should use one. I mean, it’s been years since I forgot how a hash map actually works and I manage to write code that does what I meant most of the time :)