melreams.com

Nerrrrd

How does a breadth-first search work, anyway?

In a recent post I mentioned having read an article about passing programming interviews that said it was important to be able to write a breadth-first search and to understand how hash maps work. I covered hash maps last time, so this time let’s talk about breadth-first searches.

The first question is what on earth is a breadth first search? It’s a way of searching a tree structure. in a breadth-first search, you look at all the nodes at a particular ‘level’ of the tree before looking at anything in the next level. Another way you can do it is depth-first, where you follow each node’s children down and down until you run out of children, then go back up to the next child node you haven’t already visited and follow it’s children down until you run out again, and so on until you’ve visited all the nodes in the tree.

This is definitely a case where a picture is worth 1000 words. Here’s the order you visit nodes in a breadth-first search:

By Alexander Drichel - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=3786735
Order nodes are visited in a breadth-first search. By Alexander DrichelOwn work, CC BY 3.0

and here’s the order in a depth first search:

Order nodes are visited in a depth first search. By Alexander Drichel - Own work, CC BY-SA 3.0
Order nodes are visited in a depth-first search. By Alexander DrichelOwn work, CC BY-SA 3.0

Great, what’s a breadth-first search for? According to wikipedia it’s good for a bunch of problems in graph theory that I totally don’t understand, and some more understandable stuff like finding the shortest path between two nodes in a tree and serializing a binary tree in such a way that you can easily deserialize it.

So how do you do a breadth-first search anyway?

bijulsoni has graciously provided an example in their article Introduction to Graph with Breadth First Search(BFS) and Depth First Search(DFS) Traversal Implemented in JAVA on Code Project. If you’re interested, that code is provided under The Code Project Open License (CPOL) 1.02, which basically states that you can do whatever you like with the code but don’t come crying to them if it doesn’t work.

Here’s a breadth-first search:

 
public void breadthFirstSearch() { 
    //BFS uses Queue data structure 
    Queue q=new LinkedList(); 
    q.add(this.rootNode); 
    printNode(this.rootNode); 
    rootNode.visited=true; 
    while(!q.isEmpty()) { 
        Node n=(Node)q.remove(); 
        Node child=null; 
        while((child=getUnvisitedChildNode(n))!=null) { 
            child.visited=true; 
            printNode(child); 
            q.add(child); 
        } 
    } 
    //Clear visited property of nodes 
    clearNodes(); 
} 

and to compare, here’s how a depth-first search works:

public void depthFirstSearch() {
    //DFS uses Stack data structure
    Stack s=new Stack();
    s.push(this.rootNode);
    rootNode.visited=true;
    printNode(rootNode);
    while(!s.isEmpty()) {
        Node n=(Node)s.peek();
        Node child=getUnvisitedChildNode(n);
        if(child!=null) {
            child.visited=true;
            printNode(child);
            s.push(child);
        } else {
            s.pop();
        }
    }
    //Clear visited property of nodes
    clearNodes();
}

The complete, runnable code can be downloaded from the article linked above if you’d like to run it yourself. getUnvisitedChildNode() does what you would expect, so I left it out to save space. What I find really interested about the breadth-first and depth-first algorithms is that they’re practically identical except for the different data structures used to hold the nodes we’re working on. The simple change from a queue (where you add items to the end and remove items from the head) to a stack (where you both add and remove items from the end) is all it takes to change a breadth-first search to a depth-first search.

Now we all know how a breadth-first search (and a depth-first search as a bonus) works. You can safely forget the details, secure in the knowledge you can look it up if you need to :)

Atom plugin of the day

Since I switched to Linux, I’ve had to find a replacement for my beloved notepad++. I went with Atom because I like the project view in the sidebar and I’m too cheap to pay for Sublime.

Highlights!
Highlights!

Because I miss notepad++ so much, I’ve been slowly trying to make Atom behave as much like it as I can. One very simple plugin that helps me do that is selection-highlight, which highlights all occurrences of a word when you double-click it. For something so simple it’s really helpful, and something I missed a lot when I switched to Linux. Sometimes you just want to know whether the id you logged in a few different lines is actually the same, and it’s really nice to be able to do that with only two clicks.

If you use Atom, I recommend the selection-highlight plugin, and if you don’t use Atom, I recommend trying it out. It’s free and there are about a bajillion plugins :)

How does a hash map work, anyway?

I was reading this article about programming interviews a little while ago and one of the things they mentioned was that “A startlingly high percentage of interview questions reduce to breadth-first search or the use of a hash table to count uniques. You need to be able to write a BFS cold, and you need to understand how a hash table is implemented.” I saw that and started thinking wait a minute, how is a hash table implemented? I’m sure we covered that in college, but that was quite a while ago and I’ve forgotten a lot of stuff.

Unrelated photo from pexels.com
Unrelated photo from pexels.com

Quick note: Hash tables are also known as hash maps, dictionaries, associative arrays and probably more. I usually call them hash maps because that’s what they’re called in java.

For data structures, hash maps are really kind of cool. Under the hood, the data is actually stored in an array of linked lists. The interesting part is how the hash map decides where to put new elements. In a hash map, each array index is a bucket to put elements in, not the location of a single element. It uses a hash function (according to wikipedia, “any function that can be used to map data of arbitrary size to data of fixed size”, basically you give it an object and it gives you a number in a certain range) to decide which bucket to put any given item in. Hash % array length = array index to add item to. If there’s already an item in that bucket, it walks down the linked list to the last item and adds the new item to it. Because there can be multiple items in each “bucket,” the hash map creates an entry element that contains both the key and the value so it can find the right value for a given key when you ask for an element back.

Dividing the whole collection of keys and values into buckets is how hash maps work so quickly. If you had to search the whole list of keys each time it would much longer to get an element back from the hash map, but because of the bucketing system you only have to look at the elements in one bucket. It’s also very fast to add elements because you don’t need to worry about resorting the list, you just find the right bucket and add your new element to the end of the list.

There is a complication, though: the more items in each bucket the longer it takes to find an item. If your “buckets” fill up, you’ll need to expand the hash map by making a longer array and recalculating which bucket to put each and every item into. That can be a pretty bad performance hit if you have enough items in your hash map already. You also don’t necessarily want to avoid that problem by using a very large array, that just eats up memory for no good reason if you have so many buckets that you never end up putting anything in most of them.

Because of the bucketing system, hash maps are a great way to count uniques – being able to quickly find the right bucket and look through only a few items means you can add items or see whether the hash map already contains the item you want to add. On the other hand, hash maps aren’t very useful if you care about the order of your items or if you’re just going to process all of them in a loop. Getting a list of every element in a hash map involves walking each linked list in each bucket, which can take some time (and memory!) if you have enough items. If you’re going to process everything in your collection, skip the overhead of a hash map and just use an array list or linked list.

Okay, so given all of those implementation details, why are hash maps interesting to ask about in an interview? While it’s generally a good idea to understand data structures because they’re so core to what we do as programmers, I’m suspicious they get asked about in interviews because those interviewers were asked about them. Unless you’re interviewing a recent grad and want to make sure they paid attention in class, I’m not convinced that you really learn anything interesting about someone’s ability to code by asking for more details than what a hash map is good for and when you should use one. I mean, it’s been years since I forgot how a hash map actually works and I manage to write code that does what I meant most of the time :)

Mongo tip of the day

Mongo can be very weird to adjust to if you’re used to “normal” (SQL) databases. One thing that tripped me up a little was discovering that mongo throws a DuplicateKeyException when you try to insert a duplicate into a field that has a unique index but is not a key. If you see that exception and there’s nothing obviously wrong with your _id field (mongo’s version of a primary key), have a look at any fields you have with a unique index.

Bonus tip: if you’re new to mongo I recommend their recorded Thinking In Documents webinar. It’s a pretty quick (1 hour) overview of how documents and querying work in mongo, I wish I’d found it before I spent so long fumbling around figuring mongo out on my own.

First blogiversary!

I realized the other day my blog is just over a year old. My very first post was a Play framework tip that took two whole sentences to explain. Since then I’ve published 71 more posts, go me! Turns out one or two posts a week over a year really adds up.

What I’ve learned from my year of blogging is that building a habit is way more important to blogging regularly than motivation is. Some weeks I really do not want to write a post, but I do it anyway because breaking the habit bugs me more than sucking it up and writing the damned post :) Seriously, if I waited until inspiration struck to write a post there would be less than a dozen of them on this blog.

Not that you shouldn’t take advantage of motivation when you have it! The big thing that helped me finally start this blog after ages of thinking I should really start blogging was starting a new job and learning all sorts of cool stuff I wanted to a) share, and b) make sure I didn’t forget. Writing things down helps me remember them, and even if I forget stuff I’ve blogged about, I know where to look for it.

Another thing I’ve learned is that explaining something is a great way to learn it. I hope my series of posts about SOLID design principles were useful to other people, but I probably learned more writing them than any of my readers did. You may think you understand something after you’ve read a few posts about it, but there’s nothing like trying to write your own explanation of it to show you where the gaps are in your understanding.

Over the last year my most popular post was Showing up is my secret superpower. It’s really cool that people liked something I wrote enough to share it. Sometimes stuff you throw out there because you don’t know what to write about really clicks with people.

Stay tuned, I’m planning for many more blogiversarys!

Learning How to Learn

I’ve been kind of skeptical about MOOCs (massive open online courses) for a while, but I took the Learning How to Learn course on Coursera and it was actually really good. Turns out there is free stuff that doesn’t suck :)

Photo from create her stock

Like any free resource, some MOOCs are great and some are… not. On the upside, the fact that they’re generally free (although some sites charge for a certificate of completion) means you aren’t out any money if you didn’t learn anything or were too bored to finish. The only problem is you’ll waste a lot of time if you’re too stubborn to give up on a course that’s not interesting. Do as I say, not as I do and all that :) I’m currently stubbornly hoping that I’m actually going to learn something in a very vague and high level software architecture online course I’m taking. Maybe it’ll get good in week 3?

The Learning How to Learn course, aside from being interesting on its own, is also a good example of what other courses should probably be doing. It teaches you specific concepts that you can actually go out and apply to whatever you want to learn right away. Some of the most useful things I learned in that course were the pomodoro technique and the fact that part of procrastination is that the brain reacts to a task you don’t want to do the same way it reacts to physical pain (at least, that’s how I remember it) but that if you can hang on through the initial pain, it goes away pretty quickly.

As a software developer I have to learn new stuff all the time, and because I’m totally irrational sometimes I worry that the thing I need to learn this time might finally be the thing that proves I’m just not smart enough. The Learning How to Learn course makes learning sound much less scary. Instead of learning being this mysterious thing that some people can do and some people just suck at, the course breaks down the process of how the brain makes sense of and stores new information. Learning is actually a simple process of repetition, chunking, using different modes of attention, and managing your time, it’s something anyone can do if they put in the time and effort and have learning materials that make sense to them.

It also has some tips for taking exams that I wish I’d had when I was still in college. If you have any important tests in your future, the course is worth taking just for the exam tips. I’m guessing here because I don’t have test anxiety, but I think it would be really helpful to have a simple plan to follow when your mind goes blank.

Learning about different modes of attention is also really useful for when you’re working on a really difficult bug. It’s hard to step away when you’re convinced you’ll get it if you keep trying (especially when the total refusal to give up is a pretty central programming skill), but taking your mind off the immediate problem really does help your brain to make the connection you need to solve it. When you’re too focused on a problem, your brain sort of sticks to the information you already have. When you take your mind off of it and think of nothing in particular, your brain can start making connections between things that didn’t seem to be connected at all and that’s where those sudden eureka moments come from.

The course is based on a book by one of the professors, A Mind for Numbers: How to Excel at Math and Science (Even if You Flunked Algebra). As much as I love books, I’d actually recommend taking the course first, because being tested on the material helps you learn it (a concept you’ll learn about in more depth if you take the course or read the book :) The due dates in the course also helped persuade me to make time to sit down and do it. The course also has transcripts for all the videos, so if you prefer reading to watching you’re covered.

If you want to learn things faster with less effort or you’re just interested in how your brain works, take the Learning How to Learn course.

Linux to Windows: why is everything terrible?!

Not so long ago, I wrote a post about my first impressions of Linux as a Windows user. A while after that, I booted back into Windows to get some files and oh god Windows is a terrible shock after you’ve gotten used to Linux.

To be fair, my Windows installation at home has fallen prey to the 100% disk usage bug, which means that every reboot I have to patiently wait until task manager deigns to load and I can kill processes until my computer starts working again, so Windows made a perhaps unfairly terrible first impression.

I will, however, blame Windows for their terrible update process. On Linux, I just tell the update manager to go ahead and install whatever updates it found and I go back to work while it takes care of things for me. It’s only been a few weeks, but I forgot how terrible the Windows update process is. Not only does my computer grind to a near halt while it’s installing updates, but it has to restart to finish installing the updates and it’s completely unusable while does that. And after installing the first batch of updates, it’s not unusual to find another batch that depended on the first one and have to go through the whole miserable process all over again. Ugh.

A good 20 minutes of updating later, the problem I was trying to solve (yay 100% disk usage, that’s not inconvenient at all) was still happening and I was sick of screwing around with Windows.

I guess the moral of the story is don’t go back to Windows once you’re used to Linux, it’ll just make you sad.

The interviewer is on your side

I’ve been thinking about interviews lately since I know a number of recent grads who are looking for work (ping me if you’re looking to hire a junior dev). Interviews can be really intimidating, especially when you’re trying to find your first development job. They still make me nervous and I’ve been at this programming thing for ten years.

photo of the skyline of a large city under an overcast sky
Unrelated photo from pexels.com

I’ve only participated in a couple of interviews from the employer side of the table and I’ve never run one myself, but I’m going to tell you what I learned anyway :)

One thing that you might not know is that interviewers don’t always get their other tasks cut back very much to make up for the time they spend interviewing. Don’t take that the wrong way, that doesn’t mean they hate doing interviews and want you to fail, it means they want to do the fewest interviews they can get away with and then stop.

Seriously, for the interviewer the ideal outcome of an interview is that the candidate is great, they can tell HR to send them an offer, and they can stop interviewing. Programmers don’t like getting interrupted anyway, and an interview and debrief with the other interviewers and then typing up notes for HR is a big chunk of your day.

The interviewer really and truly wants you to be great. It’s the most convenient way the interview can possibly go for them, and it really is awkward for everyone when an interview doesn’t go well. Nobody wants to watch someone crash and burn in an interview. Obviously that sucks a lot more for the interviewee, but it’s not really a great time for the interviewer/s either. Even if you’re not amazing, nobody wants to watch you fail.

Because the interviewer wants to stop interviewing, they’re looking for reasons to hire you, not reasons to turn you down. Nobody wants to make a bad hire, but ideally we can just make a good one and call it a day.

Another thing you should know is that culture fit, particularly if you’re interviewing at a startup, is extremely important. When the entire dev team is four people, the next hire makes a much bigger difference than it does on a team of twenty. On the one hand, that can make you nervous, but on the other hand, there’s nothing you can do about whether you’re a fit, so there’s really no point worrying about it. I know, I know, that doesn’t really work on me either. Do as I say and not as I do and all that :) You wouldn’t want to be on a team you don’t get along with anyway, so the best thing you can do is let your interviewers know who you are.

Something that took me quite a while to learn is that you are interviewing the company just as much as they are inte rviewing you. You’re not the only one who gets to make a decision here, the company you’re interviewing with needs to impress you too. I personally recommend asking about things like work life balance, whether they believe in crunch time, what their expectations for a new dev are, how they handle it when something goes wrong, exactly what time you’re expected to be in the office (startups are often flexible on this, but it’s good to know how flexible they are. Some offices also start at unusual times to stay in sync with head offices in other timezones), what the interviewer’s favourite thing about working for that company is, etc, etc. There are plenty of lists of standard questions to ask employers in an interview, take a look at a couple for inspiration and ask about the stuff that matters to you.

Employers really do like it when you ask questions, it shows that you care about which company you work for and aren’t just desperately trying to find anything that pays. Much like in dating, desperation is just not attractive. Employers like to feel special too :)

The one thing I really want everyone to take away from this blog post is that if the company is worth working for, the interviewer is on your side. They want you to be great so they can hire you and go back to work :)

Rubber duck debugging

Rubber duck debugging is one of those things that sounds completely ridiculous and is actually really helpful. To summarize the wikipedia page quickly, rubber duck debugging is when you figure out what’s wrong with your code by explaining it very carefully to an inanimate object.

You’re probably wondering why you would bother explaining your code to an object like a rubber duck or a teddy bear if you work as part of a team and you could just ask one of the other devs for help. Sometimes you should go directly to another developer. If there’s a serious bug in production that affects a lot of users, worrying about interrupting someone would be silly, it’s much more important to get production working again than to let Hypothetical Amy finish off her refactoring task before you bother her.

In other situations, it’s more important to give it a try yourself before interrupting anyone. The reason interruptions are such a big deal is that context switching is expensive and it’s even worse when switching to a programming task because of the number of details programmers have to “reload” into their working memory before they can get back into their work. Numbers on exactly how long it takes to get back up to speed after you’ve been interrupted vary, this Fast Company article says it takes a little over 23 minutes to get back on task but this New York Times article says it’s more like 25 minutes. If you ask someone to help you for just five minutes, it’s really not just five minutes, it’s also the time it takes for them to get back into what they were doing. That’s why programmers tend to get so cranky when you interrupt them :)

One way you can try to solve your own problem without interrupting anyone is rubber duck debugging. Having to explain all of the context around your problem, like what you’re trying to accomplish, what your code is supposed to do to get to that end result, what seems to be going wrong, where you think the problem is, what you’ve already tried, etc is one of the most useful parts of asking another person for help, and often the only part you need to solve your problem. Something about the process of explaining a problem to someone else helps you see parts that you missed before, whether it’s a log file you didn’t check or a logic error you didn’t catch. That explaining process doesn’t actually require a person to explain things to, it can work just as well if you explain it to the rubber duck, or the teddy bear, or a voice recording app on your phone or whatever works for you.

Personally, I like to write an email to the person I would ask for help if I really couldn’t figure it out myself. Putting a name at the top seems to help me get into the mindset of thinking about what that person would ask me about the problem and what they would likely suggest I try next. Most of the time I figure it out before I finish the email, but when I don’t, hey, I already have a nice tidy explanation of the problem that I can send to the person I would’ve asked for help anyway :)

If you’re stuck and don’t have anyone around to ask or just want to try everything you can before you ask for help, give rubber duck debugging a try. The worst case scenario is you end up with a good description of your problem that you could send to a friend of post in a forum.

Mongo tip of the day

The other day I learned that it’s possible to convert a field in a mongo collection to uppercase. I didn’t know that was a thing until I went digging, so in case it’s news to anyone else, here’s the stackoverflow link. While I’m at it, you can use regexs in mongo too, but of course they’re slow so don’t go nuts with them.

Now we both know just enough mongo to really get ourselves into trouble :)