melreams.com

Nerrrrd

Things they don’t tell you in school about production code

Unrelated image from pexels.com to make this post look nicer in social media shares.
Unrelated image from pexels.com to make this post look nicer in social media shares.

One of many things school can’t really prepare you for is what it’s actually like to write production code. That’s not a knock on my education or anyone else’s, it’s just not possible to get the experience of writing production code without, you know, writing production code. That said, I’m going to try to explain it anyway :)

Like I said in When is it done?, when I was in college I thought “done” meant “compiles and seems to give the right answer for a couple of happy-path tests. Actually “done” in a meaningful sense is much more than that, and so is writing code that’s really for really real ready for production.

Having code that compiles and probably even works is all well and good, but how will you know how it’s performing or whether it’s working right in production? This is the kind of thing you don’t think about for school assignments, you just hand them in and then you’re done with them forever. Logging suddenly becomes a really big deal when you need to know whether your code is working right and if not, what you need to fix.

All the details of how to log, when to log, what you should log have filled many books, so I’ll just say that it takes practice to figure out how much logging is enough but not too much and that you should feel free to lean on your senior devs to help you with that. At the very least, everything you log should include some information about the user who was logged in at the time and the account/project they’re part of if that applies. Knowing that something happened isn’t terribly helpful if you don’t know anything about the context it happened in. If in doubt, err on the side of more information. You can always filter it out or just ignore it if you need to.

In addition to logs, monitoring is also really important. Most production servers have a health check, a way to figure out if the server is up and can access things like the database and other external services. Why external services? Because a server that can’t talk to the cache/social network/payment provider/etc isn’t good for very much. More things they don’t tell you in school :) Like logging, health checks take practice too. Comprehensive checks are great, but you may not want your server to say it’s down when an external service is responding slowly either.

Metrics are important too. Whether you use a third party analytics service or roll your own with Graphite and StatsD, it’s really useful to be able to see at a glance whether your app is behaving normally. At a minimum you probably want to know how many requests you’re getting per minute and how many errors, plus anything domain specific like how many level starts or level ends per minute, how many purchases, how many new signups, etc.

Yet another thing you probably didn’t do in school is code reviews. In school, it generally doesn’t matter if anyone else understands your code. At work, it’s important than someone else is able to fix your code if anything goes wrong while you’re on vacation or home sick or away at a conference or if you change jobs. Having one point of failure is always a bad idea whether you’re talking about servers or people who are the only person who really knows x.

For very similar reasons, documentation is important too. Aside from people getting sick or going on vacation at inconvenient times, documentation is really useful when you come back to a piece of code months later or when you’re working on something that somebody else wrote. It’s also great for helping new hires get up to speed. Just because it took months for you to learn the codebase doesn’t mean it has to be that hard for the next new hire.

Unit tests also become a lot more important when you’re working on production code. They’re not just for getting your teacher off your back, they’re a great safety net when you have to change things and want to make sure you didn’t break anything that used to work. In school you hardly ever return to previous assignments, but at work you change things over and over again and it becomes really helpful to have a way to make sure you didn’t break stuff that doesn’t involve manually checking everything. Also, the more you can automate tests for, the less you have to test manually.

I’m sure I’ve just scratched the surface of things that you didn’t learn in school about production code. Readers, what most surprised you about production code?

How do binary trees work, anyway?

Programming interview concepts are back! You can find the rest of those posts under the how does it work? tag.

Before we talk about how a binary tree works, we should probably talk about what it is. A binary tree is just a tree data structure where each node has at most two children. Thank you wikipedia :) There’s nothing preventing you from making a tree where each node has more than two children, it just wouldn’t be a binary tree. A tree, binary or not, isn’t necessarily sorted either.

A public domain work found on the wikimedia commons. By Derrick Coetzee
A public domain work found on the wikimedia commons. By Derrick Coetzee

A tree works a lot like a linked list, each node has references to its children, allowing you to walk down the tree. You can also implement a binary tree using a plain old array (see the picture to the right), but that can waste a lot of space if your tree isn’t both balanced and complete. Balanced, when we’re talking about trees, means both sides have the same number of nodes (or at least close to the same number), and complete means that on each ‘level’ all the nodes are filled in. In the example binary tree below, it’s not balanced because one side has 5 nodes and the other only has 3, and it’s not complete because there’s one node missing on the third level and two or three nodes missing on the fourth, depending on whether your definition of ‘complete’ allows for any leaf nodes on the right-most end of the last level  of the tree to be missing. That doesn’t actually have much to do with the rest of this post, I just thought it was nifty :)

A binary search tree is a special case of tree where each node has 0-2 children and the nodes are sorted so that you can perform a binary search. In my post about how a binary search works, I mentioned that binary trees aren’t actually the fastest data structure to use for a binary search because it’s hard to balance a binary tree.

A public domain work found on the wikimedia commons. By Derrick Coetzee
A public domain work found on the wikimedia commons. By Derrick Coetzee

How do you balance a binary tree, anyway? Well, if you sorted all of your items before you added them to your tree, then you could start with the item in the middle, then add the middles of the two halves, then add the middles of those halves, and so on until you’ve added everything. That method only works if you already have all of the items you’re going to put in the tree and can be bothered to sort them, though. What do you do if you need to add more items later?

Basically you need to re-arrange your tree until it’s balanced (or at least close enough) again. Some of the ways you can do this are with self-balancing trees like red-black trees or AVL trees. Both of those trees add some extra data to each node to help it both figure out if it’s out of balance and get it back into balance.

In red-black tree, the extra data is the “colour” of the node. Because there are only two colours this only takes one extra bit to store. The colours, by the way, are totally arbitrary so don’t knock yourself out trying to understand the deeper meaning behind them :) According to one of the inventors of the red-black tree, red and black were the colours that looked the best on the laser printer they had available, which they were eager to use since they worked at Xerox PARC where the laser printer was invented.

A red-black tree uses the following rules to keep itself from getting badly unbalanced:

  1. A node is either red or black.
  2. The root is black. This rule is sometimes omitted. Since the root can always be changed from red to black, but not necessarily vice versa, this rule has little effect on analysis.
  3. All leaves (NIL) are black.
  4. If a node is red, then both its children are black.
  5. Every path from a given node to any of its descendant NIL nodes contains the same number of black nodes. Some definitions: the number of black nodes from the root to a node is the node’s black depth; the uniform number of black nodes in all paths from root to the leaves is called the black-height of the red–black tree.

Red-black trees do some funny business with their nodes – what you would think of as a leaf node actually has two leaves that are always black and don’t contain any information. If you’re wondering “well if they’re always black and don’t contain any information, can’t I just pretend they exist and not waste memory on them?” the answer is yes, you can totally do that.

The thing with the pretend leaves is that you need them for the third rule about leaves always being black. When you add a node to a red-black tree, you don’t add it as a real leaf, you add it to the closest node that has a value and then pretend it has black leaves. For the first couple of nodes after the root, this is super simple – the root is black, the new nodes are red, their pretend leaves are black, and everything is good. If you have more than a couple nodes in your tree, things get complicated. That’s where you break out rotations. Because this post is already pretty long I’m going to refer you back to the wikipedia article on red-black trees and this youtube video by OnlineTeacher. Normally I kind of loathe videos, but the pictures in that one are actually really helpful. Tree rotations are one of those things that are really simple when you can see them and really, really confusing when you have to describe it in words. The short version is that because of the way binary search trees are arranged, you can rotate notes back and forth around the root of your subtree, which is going to make precisely no sense unless you already know what I’m talking about :)

My understanding of AVL trees is that they work on largely similar principles to red-black trees but because they’re more rigidly balanced they’re faster on retrieval but slower on updates. Everything is a tradeoff.

And finally, because I keep hearing about it as an interview question, how do you reverse/invert a binary tree?

First, let’s define what reversing a binary tree actually means. Before I looked this up I thought it had something to do with swapping the root and the leaves, which makes no sense because tree structures normally have only one root node. It turns out the question actually means swapping the left and the right children of each node.

From my quick bout of googling, it sounds like a fairly simple recursive algorithm to walk the tree and swap each node’s right child for its left child. Now you know.

As you might have noticed, my research here centered pretty heavily on wikipedia so if I messed anything up, tell me about it in the comments.

IDE of the day

If you work with javascript, you need to try JetBrains WebStorm. It has a bunch of really great features I don’t use (I hear there’s support for node and angular and typescript) and sweet, sweet auto complete :) I still wish it was possible to have strongly-typed Eclipsey levels of auto complete with javascript, but something is much better than nothing. It doesn’t always work perfectly, but WebStorm is pretty good at taking you to the definition of a function or object too.

Full disclosure: JetBrains changed their licencing scheme not so long ago and it’s really confusing now. You do not have to keep paying forever! Once you’ve paid for 12 months you get a perpetual fallback licence that gives you only security updates but your product still works.

Either way, you can try it out for free, so why not give it a shot?

ps If any readers know of a better JS IDE I would absolutely love to hear about it.

Passion isn’t everything

Unrelated image from pexels.com to make this post look nicer in social media shares.
Unrelated image from pexels.com to make this post look nicer in social media shares.

This post is a bit of a counterpoint to my previous post about why I love programming even though it’s frustrating as hell sometimes. While I feel very lucky that I get to make a living doing something I love, I really don’t like my industry’s obsession with passion. It’s great if you feel passionate about programming, but it’s simply not necessary.

“But Mel,” you say “do you really want to work with some checked-out code monkey who half-asses everything, doesn’t give a shit about technical debt, and counts the minutes until it’s time to go home?” That’s a false dichotomy right there. There are many, many more choices than “passionate programmer” and “checked-out code monkey.” No, of course I don’t want to work with someone who doesn’t care about doing a good job. Fortunately, there are only about a zillion other points on the spectrum of “total passion” to “no passion at all.” Not being the most passionate programmer who ever lived in no way means you don’t care about doing a good job or want to improve. It just means you have other things going on in your life. Honestly, I think that’s healthier than being obsessed with just one thing.

Not to mention having other interests actually makes you a better coder. Seriously, go read that article it’s really good. Having just one obsession makes it much more likely that you approach problems from just one direction where having other interests allows you to come at problems from a different perspective. Take Adam Tornhill for example, he took ideas from forensic psychology and applied them to code analysis to get some really interesting results.

Obviously you can be passionate about more than one thing, but that still doesn’t mean passion is necessary to be a good programmer. You can take pride in your work even if you aren’t in love with what you’re working on. Using myself as an example, I stocked shelves at Wal-Mart before I moved to Victoria to go to Camosun. Was I passionate about taking things out of boxes and putting them on shelves? Of course not, but I still took pride in doing a good job. I went home every day knowing I made the department manger’s lives easier, not harder. Pro-tip: doing nothing is more helpful than leaving a mess someone else has to clean up.

Or to use a more relevant example, I don’t like doing front-end layout. The endless fiddling and wondering if that element would look better where it is or 5 pixels to the right isn’t satisfying for me, it’s just annoying. But if you give me a screen mockup, the end result will match it no matter how much swearing it takes. I don’t enjoy the process, but I do enjoy knowing I did a good job.

What’s really important is caring about the quality of your work, we just use “passion” as a proxy for that because we don’t know how to measure it. Unfortunately, we don’t really know how to measure passion, either. Sure, somebody who has side projects or contributes to open source is probably passionate about programming, but that doesn’t mean someone with no public git repos doesn’t give a shit. Simply having the time to work on side projects is an enormous privilege. People who have kids, or sick relatives they need to take care of, or who need to freelance to bring in extra cash, or who have disabilities or are neurodivergent, or would rather spend time with friends and family than do more work outside of work, or just have time-consuming hobbies may simply not have the time or energy to perform passion by working on publicly shareable side projects.

None of those things mean you don’t care deeply about programming, they just mean that you have other responsibilities or interests. Hint for employers: people who have responsibilities are more focused when they’re at work because they know they can’t put in a few extra hours later and they really hate changing jobs because it’s even more of a pain in the ass. That 20-something rockstar (ugh) dev who has no serious ties to the area might decide to move to San Francisco tomorrow. Your 30-something dev who has a mortgage and a kid is a lot less likely to randomly sell their house and uproot their family. Not that you shouldn’t trust single 20-somethings, but can we please stop pretending they’re the only worthwhile devs?

If you must look for passion, at least look for actual passion and not “free time and nothing better to do”. Ask candidates what they love about programming. Ask them if they have opinions about tools and languages and programming styles. Ask them what they would learn if their job gave them some free time and a resource budget.

But if you’re realistic about what you need, I think you’ll agree that passion is a red herring and what actually matters is caring about doing a good job. Fortunately, that’s a lot easier to find.

Does it actually need to be optimized?

Unrelated image from pexels.com to make this post look nicer in social media shares.
Unrelated image from pexels.com to make this post look nicer in social media shares.

Learning to focus on one tiny part of your problem and ignore everything else is a really useful skill as a dev, but ironically it can also get you into trouble. It’s just as important to keep the bigger picture in mind as it is to break your problem down into little pieces and do them one at a time. Why yes, this is one of those posts that is as much for me as it is for you :)

Just because a process is slow doesn’t mean it needs to be optimized. If it hardly ever gets called, who cares if it’s slow? I know, I know, it feels wrong to see something that’s slow and leave it that way, but it’s not worth the dev time unless the process gets called often enough. Slow alone isn’t necessarily bad, it’s slow and called a lot that’s a problem. If you’re lucky you have metrics to look at and know for a certainty what gets called often, otherwise you’ll be making an educated guess. This is where understanding your application and thinking about how all the different pieces fit together comes in handy.

For example, anything you can do asynchronously is not going to be your first priority for optimization. If you can hide the processing time from the user, you may never need to optimize it. Initialization, while it is the user’s first impression of your app, also happens only once a session. First impressions are certainly important, but other actions in your app will happen much more often which makes them better targets for optimization. Assuming your load time is reasonable, of course :)

Of course, learning experiences are important too, so don’t worry about this too much if you’re a junior developer. The way you learn what isn’t worth spending time on is by messing up, it’s an unavoidable part of the process. If in doubt, talk it over with your team lead/dev lead/someone with more experience, learn as much as you can from other people’s mistakes. You’ll also learn more about programming by optimizing, so even if the end result isn’t exactly critical to your application, the practice you’ve gotten means it wasn’t a total waste of time.

One of the concepts I’m still mastering as a programmer is that nothing exists in a vacuum. Context is much more important than any individual piece of code – it’s not, “speed this thing up or let it suck” it’s “speed this thing up or do one of a dozen other things that could be more useful.” Remember, your time has a value. Speeding up one piece of code, as personally satisfying as that can be, may mean much less to your users and your bottom line than a bugfix or a new feature. Developer time is expensive, it just makes sense to spend it on the things with the greatest returns.

The next time you’re about to optimize something, ask yourself how often that code is going to be called. It might sound too simple to be useful, but trust me, it’s a very easy question to forget.

.NET JWT library tip of the day

If you use JWTs (JSON web tokens) and need to generate or consume them in .NET, you might get the idea that the Microsoft library listed on JWT.io is the way to go. It’s by Microsoft, that means it’s official and trustworthy, right?

Don’t be fooled! I mean, it is official and trustworthy, but I had a horrible time trying to use it. Save yourself the trouble and use jose-jwt if you need to handle JWTs in .NET. The readme alone is a thing of beauty, it has a shockingly comprehensive set of examples for pretty much everything you would ever want to do with a JWT. The library really is as easy to use as the examples make it look. I was able to generate a JWT with it in just a few minutes and as you might have guessed from my posts about switching to Linux, my .NET experience is extremely out of date :)

Learn from my mistakes, just use jose-jwt and pretend you never heard of the Microsoft library.

Merry almost Christmas, everyone!

If you celebrate Christmas, here, have one of the very few Christmas carols I can listen to without wanting to run howling into the wilderness. And if you don’t celebrate Christmas, at least it’s almost over.

 

 

 

Debugging

There’s no way I can possibly say everything about debugging in just one blog post, but I can certainly share a few useful tips.

First, let’s talk about what debugging fundamentally is. It’s the art of seeing what actually is, not what you meant or what you thought. It’s going to be uncomfortable, and if you get too tied up in your own ego you won’t be able to do it at all. Years ago one of my teachers at Camosun told us (I’m paraphrasing heavily here because I don’t remember the exact words) that there’s no point insisting you didn’t change anything. If it used to work and now it doesn’t, you obviously changed something. Just accept that you broke it and start trying to fix the problem.

One of the first things you need to do when you’re debugging is to make sure you can reproduce the problem reliably. If you can’t do that, then you don’t really know what the problem is (or you’ve got some sort of unholy race condition bug and you’re beyond my help :) ). If you’re not the one who found the bug, ask the person who did if they can show you, or ask for more information if it came through a helpdesk and you don’t have direct access to the user who found the bug.

Once you can reproduce the problem, you’ll be able to track down what’s going wrong and figure out whether a fix actually… fixes the problem. The first step I recommend after reproducing the issue is double checking all of your inputs, even the most stupid simple stuff you’re sure you couldn’t possibly have gotten wrong. Last week I thought I had broken staging when I actually just hadn’t chosen the right value in a dropdown box. Garbage in, garbage out, as they say.

After that, you’re going to be very tempted to just read through your code and hope you can spot the problem. Resist this temptation! I’m always pretty sure I know where the problem is or that it’ll jump out at me right away, but it almost never does unless it’s an extremely simple bug. Read over the code once if you really want to, but then move on to narrowing down exactly where the bug is. Believe me, it’s faster than staring blankly at your code and feeling dumb.

If you have a particularly chatty log, you may be able to start narrowing things down while you reproduce the issue. Start looking after the last log message you see before the bug happens. If you’re very lucky something will be obviously wrong close to the log line you were looking for.

If you’re not quite as lucky, you’re going to need to run the code locally and start commenting things out. Assuming you have some idea where the problem is happening, start dividing the method it could be in, in half. Either comment half of it out or add a log line half way through and see if you see that log line before or after you reproduce the problem. Keep narrowing it down until you know exactly where the problem is. Once you know exactly where the problem is, you should now know what’s going wrong if not exactly why. It’s not unusual to have to trace back through your code to find the place that set up the issue that wasn’t triggered until later. Bad config, for example, may not cause an actual bug until long after it’s saved.

The most important things you can remember when you’re debugging are to be systematic and to not make assumptions. Don’t assume that your input is good. Don’t assume that a certain piece of code can’t be the problem because it hasn’t been changed in ages/just passed testing/doesn’t seem to be related. Don’t assume that your config is what you think it is. Don’t assume that you know what’s going on – if you did, you wouldn’t have written a bug in the first place :)

Best Practices

Let’s talk about questionable code and the best practices to fix it. Even if you’ve been a developer for years it’s good to do a quick review once in a while and make sure you haven’t picked up any bad habits.

Commenting out code instead of deleting it

I’m as guilty of this as anyone else, but honestly it’s a waste of time. If you ever actually need that code back you can pull it out of source control, that is what it’s for. As much as I’ve wanted to hedge my bets and just comment out that block so it’s easier to put it back if I need it again, I’ve almost never actually needed to do that. The commented out code just hangs around in your file forever, confusing new developers and forcing you to wonder if you should update it every time you work on that method. That’s a lot of trouble to go through over code you’re never going to use again. Just delete it, it’s easier. If you’re really worried about getting it back, put the delete in its commit and add a good descriptive commit message.

Cryptic variable names

This is a huge pet peeve of mine. Characters aren’t rationed! You can use as many as you need to make the purpose of your variable clear, nobody’s going to come and take them away. Excessively long names are annoying too, but they’re not nearly as bad as wasting time figuring out that “un” actually means user name. Autocomplete has been around for a long time, the excuse of not wanting to type long variable names is not going to fly. Give your variables meaningful names, the inconvenience will pay off in the long run.

Long methods

Again, there’s no rationing! You do not need to stuff everything and the kitchen sink into one method, you’re allowed to have more than one. Refactoring has been around for a long time too, any reasonable IDE should let you extract a method in a few clicks. Even if you have to manually refactor, the pain is worth it. With smaller well named methods, you can quickly skim through code and figure out what’s going on. Without them, you can waste a whole lot of time trying to figure out which part of that ginormous method you actually need to change.

Methods with huge numbers of parameters

This issue is closely related to the last one. If your method has an excessive number of parameters, it’s probably doing too much and it’s likely too long as well. This one can be painful to fix depending on the rest of your architecture, but if you can manage it without tearing everything apart it’ll make things much easier the next time you need to look at that code or use that method. The more parameters you have, the more changes you have to mix some of them up and introduce weird bugs into your code. If you really do need lots of parameters, at least bundle them up in a <your method>Config object to make it harder to mix them up.

These are far from the only problems you’ll see in your code, but looking out for these simple issues is a good place to start cleaning up your codebase.