Mel Reams

Nerrrrd

How does a hash function work anyway?

A while ago I wrote about how hash maps work, but something’s been bugging me. How does the hash function do its thing? I know hash functions make variable length data into fixed length data but how do they do that? To be clear I’m interested in the kind of hash you would use for a hash map, you would definitely want a more secure hash to keep your passwords safe.

Thanks to the magic of the internets, it’s really easy to find the function java uses to calculate a String’s hashcode.

/* Returns a hash code for this string. The hash code for a String 
object is computed as s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the ith character of the string, 
n is the length of the string, and ^ indicates exponentiation. 
(The hash value of the empty string is zero.)
Returns: a hash code value for this object. */

public int hashCode() {
  int h = hash;
  if (h == 0) {
    int off = offset;
    char val[] = value;
    int len = count;

    for (int i = 0; i < len; i++) {
      h = 31*h + val[off++];
    }
    hash = h;
  }
  return h;
}

Okay great, that’s totally clear, right? ;)

Yeah, I have no idea what it’s actually doing either. But I can find out!

First of all, where are the values of hash, offset, and count coming from? They must be instance variables because they weren’t passed in as parameters. I poked around in the String code a little more and it turns out hash is defaulted to 0 when it’s declared, offset is set to 0 in the constructor, and count is set to the size of the string when it’s created.

Unrelated image from pexels.com to make this post look nicer in social media shares.

The first thing hashCode actually does is checks if hash is 0. If it’s not, then we know we already computed the hash and we can just return it and go on with our day. Makes sense, why do the same calculation over and over again when we can just do it once and store the result? I think that’s the same reason count is stored separately instead of just calling value.length() when you need it. We know the length will never change because Strings are immutable, so why not save ourselves a lookup?

The next weird thing is how the method is adding a number to a char. Chars are characters, not numbers, aren’t they? Well, yes and no. According to the docs, a char is “a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).” That 0 to 65,535 part seems suspiciously like a number :) You can also test that out yourself in the Java REPL. It turns out Java will happily treat a char like an int if you ask it to.

The rest of it is pretty simple, we’re just looping through every character in the string and adding (31 * current hashcode) + current character to the existing hashcode.

Okay, but how does that map a string of any length to a hash code of fixed length? Shouldn’t a longer String always have a larger hash code? Not if your hashcode is an integer! Those just roll over into negative numbers if you add too large of a number to them. And because 2 and -821785444 are both integers they take up the same amount of memory, which means that no matter what size String you start with, the hashcode is always the same size.

Another interesting little detail of how hashmaps actually use those hashcodes is that they rehash your hashes. If everyone used random Strings for keys then they wouldn’t need to, but because keys are usually Strings with some kind of meaning, that means the hashes for those keys won’t be evenly distributed. That is, a hashcode doesn’t have an equal chance of being any number from -231 to 231-1, you’re going to get clumps of hashes around some numbers because you’re more likely to use some Strings than others.

Great, but why does that matter? Performance! The more collisions you have (different Strings that happened to work out to the same hashCode), the more elements you need to look at to find the one you wanted and the worse your performance is. To get around that, java does some bitwise operations on the hashcode to reduce the number of collisions.

Now we all have some idea what actually happens when you use a HashMap :)

Link of the day

This post titled How I Ruined Office Productivity With a Face-Replacing Slack Bot (Without Really Knowing What I Was Doing) has been floating around the internets lately so you may have already seen it. Even if you have, it’s an interesting example of how a programmer breaks down a new problem and experiments with the pieces before putting it all together. I especially like how that post shows the tiny experiments Jason started with and how those built on each other.

Do you have to be good at math to be a programmer?

Slightly related image from pexels.com to make this post look nicer in social media shares.

One of the most common misconceptions I’ve heard about programming is that you have to be a math whiz to be a good programmer. I’ve mentioned this before but I want to attack that particular stereotype more directly.

It is true that there are similarities between math and programming – programming uses some math terms like function and variable, it’ll be a lot easier to get things onto the screen if you understand coordinate systems, and of course to do math or programming you spend a lot of time manipulating symbols and thinking abstractly, but you know, programming is also a lot like language. I mean, they call them programming languages for a reason. To quote Hal Abelson, Jerry Sussman and Julie Sussman’s book Structure and Interpretation of Computer Programs:

First, we want to establish the idea that a computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute.

Programming is fundamentally about communication. Not only are you communicating with the machine, you’re communicating with your future self, the rest of your team, and potentially other teams if you end up working for a large enough company. And that’s just when you’re writing code! You’ve also got to make sure you’re building the right thing and let other people who are waiting for that thing know how it’s going. You know what’s great practice for communication? That’s right, English classes! Sarah Mei has a great blog post about the same point I’m trying to make in this one, here’s an especially interesting quote:

The students I saw – all adults – came from a wide range of backgrounds. People with a math background did fine, of course, but people with a heavy language background often did better.

Nobody seems to have an official study on this, but I think there’s something there. Language requires just as much abstract thought as math does, with the added benefit (to your communication skills) of failing your classes if you can’t make yourself understood to anyone else. As a bit of an aside, the biggest problems I’ve run into at work have never been problems of pure programming skill (although those ones do suck a lot), they’ve been communication problems with team members, other teams, management, other departments entirely and other offices.

And to stick another nail in the coffin of the idea that programming is math, I’ve been a developer for ten years now and the most complicated math I’ve ever had to do at work is stuff like “if I have a div that’s x pixels wide and y pixels tall and I want it centered inside another div that’s a pixels wide and b pixels tall, what should my margins be?” I used to believe that programming was math, and some programming is certainly math heavy, but it’s hard to argue with the lack of math in my day to day job over ten years and four companies.

So given that programming is not math, why do so many people think it is? I’m going to quote Sarah again because she already said it really well:

When programming was just getting started, early in the last century, we used it to solve highly mathematical problems like calculating missile trajectories and decrypting secret messages. At that point, you had to be good at math to even approach programming. Tools, such as programming languages, were designed specifically to solve mathematical problems, because those were the ones we thought it was worth spending money on. Computers were for doing math.

I think another issue is the way we teach programming (credit where it’s due, I stole that idea directly from a comment on Sarah’s post). Computers are great at doing boring and tedious calculations, plus it’s easy to tell them how to do it, so why not start teaching programming by having students make the computer calculate something for them? It makes it really easy for the students to check their work too – assuming they’re able to do the math problem themselves, they can easily check if the computer got the same answer.

That assumption can be a real problem, though. If you can’t do the math problem or don’t care about the math problem, you end up effectively shut out of programming, something you could very well be great at, because of something that has next to nothing to do with the actual job of software development.

If you can do basic arithmetic and deal with the concept of variables, congrats, you can be a programmer! That is literally all the math you actually need!

Be a better programmer while still having a life: part 6

Unrelated image from pexels.com to make this post look nicer in social media shares.

Willpower is a big deal when you’re doing an inherently difficult and frustrating task like programming. It’s also something we tend to assume we either have or we don’t, but there are actually a lot of things we can do to improve our chances of making it through a willpower challenge whether that’s making yourself write boring documentation or resisting the urge to screw around on reddit all afternoon instead of doing your work.

Kelly McGonigal gave a talk called The Willpower Instinct (not so coincidentally also the title of one of her books) about that exact subject at Google. It’s a really interesting talk and I recommend watching it, but it’s also an hour long so I’m going to list the takeaways here. Full disclosure, some of these things will take up a little bit of time outside of work, but they will definitely not eat your life and will be useful for way more than just being a better programmer.

1 Sleep! Just getting enough sleep makes it much easier to make decisions that support your long term goals/resist choices that sabotage your long term goals. Meditation helps too, that was the tool used to improve sleep in the study Kelly McGonical referenced. That study was done on people in rehab for drug addictions, and even with an urge that strong to use drugs, simple sleep and meditation made the partipants much more resistant to relapse than the control group. Something really cool about that study was that the participants got those fantastic benefits from meditating for only 10-15 minutes a day.

There’s a lot of really interesting stuff in the talk about how different parts of your brain actually work together (or don’t) when you need to use your willpower to delay gratification or work toward a distant future goal, but I’m not out to transcribe the whole talk here, you’ll have to watch it for all the details :)

2 Stop beating yourself up when you make a mistake. Beating yourself up doesn’t actually improve your willpower at all. It turns out that the harder you are on yourself when you have a willpower failure, the sooner you fail again and the worse it will be next time. The worse you make yourself feel, the more you want comfort, and the more you want comfort, the more likely you are to turn to the exact thing you’re trying to stay away from, whether that’s chocolate, alcohol, cigarettes, or excessive youtubing. It actually works much better to have compassion for yourself and just try to do better next time.

3 Learn to identify with your future self. Most of us feel like our future self is a stranger, and the more you feel like that, the less likely you are to do things to protect that stranger’s health and happiness (like exercising or saving for retirement). Who here has cursed their past self for not commenting their code? We all know that we should, but we hardly ever do it. I think a big part of that is that we somehow don’t believe the frustration our future selves will feel when they have to work on another piece of badly commented code is as bad as the frustration our present selves feel.

One way you can get to know your future self is by writing a letter from them to your present self. This can either be a general letter about who future self is and what they’re up to, or a more specific letter thanking your present self for doing the work of exercising more, quitting smoking, or actually commenting that code even though it seems perfectly clear right now.

4 Imagine yourself failing. It’s totally counter intuitive, I know! The idea is to think about how things could go wrong for you so you can come up with workarounds. Say you want to exercise more, if you know that you’re likely to put things off until it’s too late at night to go for a run, you could leave yourself notes or set reminders to make sure you actually go for that run. If you keep track of what goes wrong for you and basically become a detective of your own failure, pretty soon you’ll have a comprehensive set of workarounds / ways to avoid the things that trip you up.

It’s also really helpful if failure doesn’t come as a shock to you – it’s easier to shrug it off and try again if you’re a little bit pessimistic and go “okay, I know I’m going to fail at this a few times, what am I going to do when that happens?” Sure, it seems obvious that the answer is “get back to the thing I was doing to reach my goal” but if failure is a surprise then it’s really easy to decide, “welp, I missed two workouts this week, I’m fundamentally a failure, no point in trying again so I might as well sit on the couch all weekend watching tv” even though two workouts are just not a big deal in the grand scheme of things and you could easily get back on the wagon by working out on the weekend.

5 Learn to tolerate discomfort. In her talk Kelly McGonical mentioned that the ability to hold your breath for 15 seconds is actually a very strong predictor of how well you will do in willpower challenges. There’s a technique called “surf the urge” that relies on that idea. When you “surf the urge” you give your full attention to your craving and trust that if you just sit with it for a while, it will pass. Immediately trying to repress the urge, whether it’s to have a chocolate bar or to screw around on reddit for three hours, just teaches you that the discomfort you’re trying to avoid is intolerable and you Absolutely Must Do Something About It Right Now. That’s just not true, you’re a grownup, you can ride it out. What you do by surfing the urge is break the link between feeling stress and opening that tab, which makes it much much easier not to give in.

And finally, here are some much less evidence-backed tips from me:

A Outsource your willpower if you can – browser extensions like stayfocusd (I haven’t found a really good mobile app yet, recommendations welcome!) that just don’t allow you to spend all afternoon screwing around on Facebook mean you can take all the energy you would have spent resisting the urge to open that tab and spend it on something else.

B Build habits. Once something is a good solid habit it takes way less willpower to keep doing it. Flossing, for example, is boring and a hassle but once it’s a habit it feels way more weird not to do it than to just get it over with. I recommend BJ Fogg’s Tiny Habits course if you want to make more things a habit so you can stop thinking about them.

Be a better programmer while still having a life: part 5

Unrelated image from pexels.com to make this post look nicer in social media shares.

Today’s tip for becoming a better programmer while still having a life isn’t just good for you, it’s good for your whole team. That tip is documentation.

I know, nobody actually likes doing documentation. Fortunately, I’m not talking about dry design docs or endless specifications, I’m talking about a simple wiki (or whatever works for you), written by you and your team for you and your team. It doesn’t have to be polished, it doesn’t have to be formal, it doesn’t even have to be spelled perfectly. All it needs to be is correct and understandable. The idea is to explain things just like you would to another dev on your team, not to waste hours proofreading.

Great, but what should you document? In short, anything you’re going to want to know later. Especially useful things to write down are why architectural decisions were made, how parts of the system work at a high level, guides to testing certain features (great for complicated payment provider integrations, not that I’ve been fighting with those), and especially guides to setting up your development environment, whatever that thing is that you forget every time and have to look up again or ask someone about.

Just saving yourself time relearning parts of your codebase will help you get things done faster, which is basically being a better programmer, but explaining things (even if you’re writing to yourself) helps you understand them better. Explaining something forces you to think about every little detail, and putting something in your own words helps it stick in your brain – it’s the same principle that helps you remember things better when you take notes on them, whether or not you ever look at those notes. The better you understand your codebase, the better decisions you can make about how to add new features or fix bugs.

As a bonus, reflecting on why you made certain decisions and being able to go back later and see if the reason you did things the way you did holds up over time also makes you a better developer. You don’t even need to make a special effort to do this, if code gets used it gets changed. All you need to do is wait, eventually you will have to revisit the code you documented and by the time you do that, you will probably have forgotten the details and will need to go look it up in that handy wiki you created :)

Writing things down also helps you get that answer again without interrupting anyone, and the less you can interrupt teammates the better considering how expensive interruptions are for programmers. I couldn’t find a hard number, but I feel comfortable saying that it takes at least ten minutes to get back into a complex task like programming once you’ve been interrupted. It can be even worse if you get interrupted at just the wrong time. I have a terrible time getting into a task when I know I’ll have to stop soon for a meeting or appointment or whatever, which means an unfortunately timed interruption can kill a solid half hour of productivity even if it only took a couple of minutes to deal with the interruption itself.

All the benefits you get from making knowledge about your code available and searchable also apply to your teammates. If you just document how to set up your development environment, that can save hours every time someone has to set up their environment whether it’s a new employee starting or a long time employee getting a shiny new computer to replace an old one. It’s even better for everyone around you if the whole writing things down idea catches on – then they also get the benefits of explaining things and documenting the particular things they forget and need to look up as well.

And best of all, this is another work thing you can do at work that has no effect on your personal time.

JUnit tip of the day

Fun fact about JUnit tests: if something throws an exception that prevents your test from completing normally, it can’t clean up after itself. Normally this isn’t a big deal but if, for example, your setup method adds any test data to your database or creates a whole new database, you’re going to need to clean that up manually. Turns out extra databases eat up a lot of hard drive space if you don’t tidy them up for, uh, months. Just fyi :)

Be a better programmer while still having a life: part 4

Unrelated image from pexels.com to make this post look nicer in social media shares and because red pandas are adorable.

As much as we would all like to believe that programming is about logic, not feelings, being able to deal with your emotions is incredibly important if you want to be a better programmer. For example, one of the best things you can possibly do for you career is to learn to take criticism. Logically everyone should be thrilled to get feedback on their work but you know what gets in the way? Yep, it’s emotions.

It can suck to hear all about what’s wrong with your code, especially if you worked really hard on it and thought that this time you finally got it right. Hearing about what you need to improve is incredibly helpful (who has time to make every possible mistake on their own? that would take forever!), but that doesn’t mean it never hurts your feelings. And when people’s feelings are hurt, it’s really hard for them to listen. Sometimes emotions get in the way of your improvement as a developer.

So how do you learn to take feedback even when it hurts?

First of all, you’ve got to accept that it’s normal to have feelings about being told that your code isn’t good enough. There’s no fixing a problem that you can’t admit is happening, after all. That absolutely does not mean you should be ashamed of having feelings – not only is it just plain wrong to shame anyone, including yourself, for being human, but it’s a huge waste of time. All it means is that if your feelings are keeping you from being able to take feedback, you’ve got to get a handle on them. You would debug a program that couldn’t handle certain inputs, right? Sometimes your mind needs debugging too.

One way to do that is to remind yourself that feelings aren’t facts. Just because you feel attacked or like everything you do is stupid and terrible and you’re never going to be any good at this and should just crawl into a hole doesn’t make it true. That’s just your brain freaking out, if you give it a minute it will calm down. Yes, it sucks in the moment to hear that your code needs work, but you’re going to feel fine tomorrow. With practice, you’ll get used to hearing what you can improve and start skipping the boring angsting step to go directly to fixing things.

A common thing programmers do that makes that way harder is to get really really really attached to their code. You should absolutely care about whether you’re doing a good job but you’ve got to remember that your code is not you. The quality of your code has nothing to do with your worth as a human being and any one piece of code doesn’t even mean much of anything about your skills as a programmer. Everyone has off days, making a mistake does not mean you’re a bad programmer, will always be a bad programmer and should just go find a hole to crawl into. Everybody makes mistakes, you’re not that special ;)

Quick side note: code reviews are never supposed to be mean. Aside from being a tremendous dick move and thoroughly unprofessional, it’s a waste of time to be a jerk when you’re giving feedback. If people hate doing code reviews they will find ways to not do them and the quality of code on your project will suffer. If you can’t be nice because it’s right, be nice because it’s effective. And if your team lead/senior dev/whoever does your reviews can’t at least be civil, find a new job. Life is too short to put up with dicks.

Something else you can remind yourself about when you’re having feelings about a code review is that nobody gives advice to people they don’t believe can do better. If you believe someone can’t learn you don’t bother giving them a code review, you quietly fix what you can while they’re not looking and hope they eventually get fired. Bothering to give someone feedback is a statement that they’re worth the effort.

The most important piece of advice I can give you, though, is that you need to be able to take a step back both from your feelings. Meditation is great for this, by the way. I’m going to steal Headspace’s traffic metaphor here because it’s really good: training your mind isn’t about forcibly clearing it, it’s about learning to watch your thoughts and feelings go by like cars on the road and not needing to run out into the street and direct traffic. Having a feeling doesn’t mean you have to do something about it right then, you can wait a bit and see if it’s still important.

That stepping back thing is useful for more than just code reviews, too. Professional software development can be really frustrating for reasons that have nothing to do with writing code. Features get changed or cut, projects get dropped, companies change direction, sometimes things never make it into production no matter how much you believe in them or how hard you try. If you can’t step back from your attachment to a project and accept that things don’t always go the way you wish they would or even the way they logically ought to, you’re going to have a bad time in this field (or any other for that matter).

While you’re at it, literally step away from the computer now and then and do something else. Not only can you become a better programmer and still have a life, but having a life will make you a better programmer. Nerds tend to hyperfocus, which is great when you have a concrete task to get done but not quite as great when you lose all sense of perspective because you do nothing but code all day. Having other things you’re good at and other things to look forward to really helps put that one bad code review into perspective.

Be a better programmer while still having a life: part 3

Unrelated image from pexels.com to make this post look nicer in social media shares.

More stuff you can do to be a better programmer while still having a life!

The core of programming is really problem solving, but we’re kind of expected to pick it up as we go while we’re learning specific skills like programming in java. I have a strong suspicion that’s why so many new developers feel totally lost when they try to build something on their own: we’ve collectively done a bad job of teaching them how to break down problems and solve them.

Problem solving is a gigantic topic so I’m not pretending this blog post is going to be comprehensive, I’m just aiming to share a few tips to get you started.

First of all, don’t just start coding. It really helps to have some sort of plan for what you’re going to do. If you dive in and just start coding without any sort of plan you’ll often end up lost in the weeds because figuring out how to solve a problem and how to code that solution at the same time is so much harder than doing those things one at a time.

Ironically, it’s also hard not to do those things at the same time. It’s really common to feel like you’re wasting time planning and making lists when you should be writing code, or that planning is boring and no plan is ever perfect so why bother? It’s also not at all unusual for programmers to start coding without a plan because they’re worried about whether they can solve the problem, so they just start writing code and hope the complete solution will come to them.

Having a plan actually speeds things up because it cuts down on the feeling your way around in the dark that you end up doing when you don’t know where you’re going or how to get there. If you really can’t stand the planning part, programming may just not be for you – solving problems is the real job, code is just the tool we use to solve them. Making a plan may force you to admit that you just don’t know how to solve a certain problem which sucks, I’m not going to lie, but isn’t it better to figure that out right away instead of potentially spending days or weeks coding only to find out that your partial solution is never going to work? Trust me, that sucks way more.

So that’s all great, but how exactly do you break down a problem and make a plan to solve it?

One example is laid out in 5 Steps to Solving Programming Problems by Adrian Prieto. The step I want to concentrate on is “2. Solve the problem manually.” Seriously, write it down exactly the way you would do it if you had to do it manually. If part of the problem involves looking things up in the database, just pretend you have a gigantic binder or that you can call another department for the data you need. The imaginary manual procedure might sound really awful and tedious but that’s fine, computers are great at boring and tedious.

Example time! A common programming challenge question is to write a palindrome checker – you give it a string, it tells you whether or not it’s the same backwards and forwards. It’s really easy to look at a short palindrome like racecar and see that it’s obviously the same backwards and forwards, so let’s imagine we have a much longer string. Now how do we tell whether or not it’s a palindrome? One way is to look at the first character and see if it’s the same as the last character, then the second character and the second last one, and keep doing that until we get to the middle of the string (let’s assume for now that we don’t have to worry about spaces or punctuation). If we make it to the middle of the string it must be a palindrome, so we can return true. If any of the pairs of characters don’t match then it’s definitely not a palindrome, so we can stop checking right there and return false.

Now that we have a manual process, we can start adapting it for a computer to run. The thing you have to remember is that computers are really dumb. If you tell a person “stop when you get to the middle of the string” they’ll know what you mean and what to do if the string has an odd number of characters so there’s only one in the middle. Computers, on the other hand, don’t know what a “middle” is unless you tell them and don’t know what to do with a string with an odd number of characters either. For that matter, they don’t know how to tell they’ve gotten to the middle of the string.

In the interests of not making this post 5000 words long I’ll skip the specifics of telling a computer how to find the middle of a string, the important thing to know is that once you have an overall process you can dive into subproblems like “how can the computer tell when it has checked enough characters?” without losing sight of why that problem even matters and how it fits into the rest of your solution.

It really doesn’t matter how bad your first try at a process is. Writing down how you would do it manually is always good enough for a first try, and once you have a basic process you can start improving it or even trying totally different algorithms. The great thing about a simple set of point form instructions or even pseudocode is that it’s much quicker and easier to change than actual code, so you can mess with it to your heart’s content.

That works great for straightforward problems, but what do you do with an overwhelmingly large problem where it’s not obvious where to start? Let’s say you want to build a todo list app. Should you start by designing the UI? Or by figuring out what your data model should look like? How do you decide which features your app should have?

The question about what features you should have is one of the simpler ones: build the tiniest thing that could possibly work. You can always add more features later. In the case of a todo list app, I would start with being able to add new todos and being able to cross off existing ones. Don’t worry about subtasks, don’t worry about due dates or recurring tasks or anything else, just start with the simplest thing that could possibly be considered a todo list.

As for where to start, if I knew the right way to approach every problem I’d be a millionaire :) I personally like to start with the UI so that once I get down to the data model I already have a list of all the visible information that I definitely need to store. On the other hand, if you’re especially comfortable with data modelling it can be easier to start there. That can also save you from building a database that’s perfect for one screen and kind of terrible for everything else. It really doesn’t matter where you start as long as you remember to look at your overall design from different angles and see if it still makes sense.

At the “boxes and lines” diagram stage just like the point form description of a manual process stage you lose hardly any time at all if you sketch out a solution, figure out it doesn’t handle something well, then throw it out and start over. The important thing is to learn more about the problem, not to come up with a perfect design on the first try.

Like communication, practising problem solving doesn’t have to take up a single hour of your time outside of work. You can get all the practice you need on work problems while you’re at work and still have a life outside of work :)

WordPress Appliance - Powered by TurnKey Linux