melreams.com

Nerrrrd

Merry almost Christmas, everyone!

If you celebrate Christmas, here, have one of the very few Christmas carols I can listen to without wanting to run howling into the wilderness. And if you don’t celebrate Christmas, at least it’s almost over.

 

 

 

Java best practices: synchronization

Java’s synchronization can be really helpful, but it can also get you into plenty of trouble. Synchonization is in no way a magic wand that you can wave around to get rid of multi-threading issues, you have to understand how to use it.

In java (and many other languages, but java’s what I’m familiar with), synchronization prevents threads from accessing the same data at the same time. Concurrency (multiple threads sharing access to the same variables) is a gigantic subject, so I’m going to gloss over it here by saying that things can go wrong in deeply weird ways when threads accidentally overwrite each other’s updates to a variable or work from different copies of the shared variable. Synchronization can stop that from happening if you use it correctly, but at the cost of a hit to performance and the need to be very very careful that you don’t introduce deadlocks.

public synchronized void example() {
   //do things
}

Using the synchronization keyword on a method (like in the example above) synchronizes access to that entire method, which is generally pretty safe but unless you have a very small method you may be synchronizing a bigger chunk of code than you absolutely need to, which is more of a performance hit than necessary. Because synchronized blocks/methods can only be accessed by one thread at a time, they really slow down processing. The larger a chunk of code you synchronize, the worse the performance hit is.

class Example {
   Message m;

   public Example(Message m) {
       this.m = m;
   }

   public void doThings() {
       String name = Thread.currentThread().getName();
       synchronized(m) {
           //actually do things with m
       }
   }
} 

The synchronization method, while it makes it easier to synchronize only the part you need, also makes it easier to mess things up by introducing a deadlock. A deadlock happens if thread A needs locks on objects Y and Z and thread B needs locks on objects Z and Y in that order. If A locks Y and waits for Z to be unlocked, and B locks Z and waits for Y to be unlocked, both threads wait forever and nothing happens until you restart your program. If you lock on multiple objects (which you should definitely do if you need to update multiple shared objects in the same block of code), make sure that you absolutely always lock on those objects in the same order. The same problem applies to mysql deadlocks, which can really suck to debug if your codebase is large enough.

While we’re at it, according to stack overflow, synchronized(this) can be dangerous because it synchronizes on the entire instance. If you have another block that synchronizes on this, it can’t run until the other lock on this unlocks. It also means any external locks on that object can’t run until it’s unlocked, which can cause serious performance problems if you do it enough.

Aside from being very careful when you do use synchronized, the best advice I can give you is to use it as little as possible. If you can, just don’t have shared state. Particularly in web programming, you generally shouldn’t keep state around for longer than it takes to process a request.

Finally, if you use synchronized and mess it up, don’t waste time beating yourself up about it. Concurrency is even worse than timezones and everyone messes it up sometimes.

Degrees aren’t everything

A common worry I see in self-taught developers is that not having a degree means that you’re not a good programmer and no one will hire you. I’m not going to lie, having a degree does make it easier to get an interview, but it in no way guarantees that everyone with a degree is a better programmer than everyone without.

Here’s a fun fact about hiring developers: having a degree tells interviewers so little about whether you can code that people came up with the idea of asking candidates to code a very, very simple math “game” called fizzbuzz to figure out if you can write a for loop all by yourself. I’m completely serious, in the mid-late 2000’s fizzbuzz was all over the programmer blogosphere. If you poke around online you will likely find a bunch of criticism about how fizzbuzz is too simple to tell you anything interesting about a junior programmer candidate, which I think is true but is not the point of this post.

Three women of colour having a meeting in a boardroom
Photo provided by WOCInTechChat under a CC Attribution-ShareAlike License

Back at my point, it sounds completely ridiculous that you would need to ask a new college/university grad to prove they can write a for loop and a couple of conditionals. How could someone graduate and not be able to code? That question really deserves its own blog post, but part of it is that memorizing facts for a test is a very different skill than writing code on the spot to solve a problem.

Ridiculous or not, people started asking developers to code fizzbuzz for a reason and it’s not because hiring without it worked so well. Fizzbuzz exists as a programming concept because interviewers needed a quick way to weed out people who simply couldn’t program at all.

If you don’t have a degree, don’t feel bad. If you can program at all, then you’re already ahead of the game. You’re probably just as good a coder as anyone who does have one. In a lot of ways being self-taught is more impressive because once you make the decision to go to college/university you’re essentially locked in. Aside from feeling like you have to get your money’s worth once you’ve started paying for a degree, there’s a massive amount of social pressure not to drop out and feel like you’ve disappointed your family and friends.

If you study something on your own time, on the other hand, then it’s much easier to just stop when you’re bored or it’s hard or you’d rather go have a pint with some friends. When nobody will know or necessarily care that you stopped, it’s a lot harder to keep going.

To be fair, having a degree/diploma/certificate from a bootcamp/etc does open doors, and you can end up with really frustrating gaps in your knowledge if you’re self-taught. My husband is a sysadmin but he grudgingly does a little bit of programming when he has to (he’s weird and thinks setting up servers is more fun than writing code). Not so long ago I had to tell him about maps/associative arrays/dictionaries because the last programming class he took was in high school and the language they used didn’t have maps. Turns out there was a point to all the time my teachers spent hammering datatypes into our heads in college after all :)

A degree or a diploma doesn’t mean anyone is special or a better programmer than you are. It really just means they had the good luck and inclination to pick up a degree and some ability to follow through, at least when a large amount of money and the prospect of their parents being disappointed is on the line. Who knows, maybe you’ll be the one asking new grads to code fizzbuzz one day.

Cmder rocks!

Cmder is an awesome tabbed command line interface for windows. Unlike the regular windows console, cmder is resizeable, includes handy linux commands like grep, and uses a font that isn’t hideous. Honestly, while the other features are great, being able to resize the freaking window was one of the biggest selling points for me. It’s incredibly irritating to try to read a log in a window that’s only 80 characters wide when you’re running a java server that sometimes throws very wide error messages.

Cmder can also be integrated with programs like Sublime Text. I haven’t done it myself but it’s cool to know I could. For git users, cmder has another really cool little feature – where the prompt usually shows you which directory you’re in, cmder adds which branch you have checked out to the end, and it turns that branch name red if you have changes you haven’t committed. It’s amazing how helpful that is.

Cmder with current git branch
Cmder with current git branch

If you use windows and you run anything from the command line, give cmder a try. Shiny shiny tabs await you :)

 

Different languages are good for different things

As you learn to code and learn new programming languages you’ll often hear that different languages are good for different things. Technically you can do just about anything in any language, so for a long time that never meant much to me. Once you get past basic conditionals and loops, there actually are pretty major differences in how easy it is to do different things in different languages.

Here’s a handy example: the other day I wanted to figure out how much I spend on average each month so I could figure out how much I can reliably throw into my RRSP. Okay, use mint, you say. Not so fast there! I only wanted to know about my expenses NOT including RRSP and TFSA contributions, and I wanted to leave out the month I got married because it’s a huge outlier and screws up my average :) If you can get mint to do that, I’d love to hear how.

What I ended up doing was downloading my transaction history as a csv from my bank and manually removing the stuff I didn’t want to include. Then I needed to create monthly totals (so I could see if those looked reasonable) and an overall average somehow. I was hoping I could do that with a simple formula in a spreadsheet, but after fiddling with it for a bit I decided I’d rather poke myself in the eye than stick with that idea.

Python to the rescue! Not so long ago I was a mentor at a Ladies Learning Code workshop about data processing with Python. At the end of that workshop we ended up with a little script that read in a csv, did some processing, and output the results, which is exactly what I needed. I started with that script and ended up with this:

# Import the csv library
import csv
import datetime

# Open the statement file
statement_file = open('./statement.csv')

# Convert it to a csv_data structure
statement_data = csv.DictReader(statement_file)
current_month = -1
current_year = -1
months = 0
grand_total = 0.0
running_total = 0
# Loop through each of the rows
for transaction in statement_data:
    # deposits have a blank in the withdrawal field, we only want withdrawals
    if transaction['withdrawal'] is not '':
        #convert the string date to a date object so we can get the month
        date = datetime.datetime.strptime(transaction["date"], '%d-%b-%Y')
        #every time we hit a row where the month doesn't match the month from
        #the last row we know it's a new month and we need to update current
        # month & year and increment the month count
        if date.month != current_month:
            if current_month > -1:
                months += 1
                #print current_month instead of date.month because date.month
                #is the new month
                print(str(current_month) + "-" + str(current_year)
                      + " monthly total: " + str(running_total))
            current_month = date.month
            current_year = date.year
            running_total = 0
        running_total += float(transaction["withdrawal"])
        grand_total += float(transaction["withdrawal"])

# one more print statement for the last month in the file
print(str(current_month) + "-" + str(current_year) + " monthly total: "
      + str(running_total))

average = grand_total / months
print("avg: " + str(average) + " over " + str(months) + " months")

Then I started thinking, that was weirdly easy considering that since college I’ve touched Python twice – once while preparing for that Ladies Learning Code workshop and once while actually mentoring at the workshop. That made me wonder how Java, the language I’ve used just about every day at work for the last nine years, would compare. So I ported my Python script to Java and this is what I ended up with:

import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVRecord;

public class Calc {
    public static void main(String[] args) {
        try {
            int monthCount = 0;
            int currentMonth = -1;
            int currentYear = -1;
            float grandTotal = 0;
            float runningTotal = 0;
            // open the statement csv
            Reader in = new FileReader("statement.csv");
            // parse it into CSVRecords so we can get values out more easily
            // unlike python this CSV library doesn't seem to automagically
            // figure out what a header row is so I had to add the headers
            // manually
            Iterable<CSVRecord> records = CSVFormat.DEFAULT.withHeader(
                    "account", "date", "desc", "num", "withdrawal", "deposit",
                    "balance").parse(in);
            // loop through each of the rows
            for (CSVRecord record : records) {
                String dateStr = record.get("date");
                String withdrawalStr = record.get("withdrawal");
                // deposits have a blank in the withdrawal field, we only want
                // withdrawals
                if (withdrawalStr != null && !withdrawalStr.equals("")) {
                    // java requires a lot of boilerplate around parsing a
                    // string into a date that we can get a month out of
                    DateFormat df = new SimpleDateFormat("d-MMM-yyyy");
                    Date transactionDate = df.parse(dateStr);
                    Calendar cal = Calendar.getInstance();
                    cal.setTime(transactionDate);
                    // every time we hit a row where the month doesn't match 
                    // the month from the last row we know it's a new month 
                    // and we need to update the current month and increment 
                    // the month count. technically we can get the month 
                    // using transactionDate.getMonth() but that method is 
                    // deprecated and I'm trying to set a good example
                    if (cal.get(Calendar.MONTH) != currentMonth) {
                        monthCount++;
                        if (currentMonth > -1) {
                            // in java months start from 0, add 1 so we get
                            // nicer looking output
                            System.out.println((currentMonth + 1) + "-"
                                    + currentYear + " monthly total: "
                                    + runningTotal);
                        }
                        currentMonth = cal.get(Calendar.MONTH);
                        currentYear = cal.get(Calendar.YEAR);
                        runningTotal = 0;
                    }

                    float withdrawal = Float.parseFloat(withdrawalStr);
                    grandTotal += withdrawal;
                    runningTotal += withdrawal;
                }
            }
            // one more print statement for the last month in the file
            System.out.println((currentMonth + 1) + "-" + currentYear
                    + " monthly total: " + runningTotal);
            float average = grandTotal / monthCount;
            // the one convenient thing java does here is 'autoboxing' - it
            // automatically converts non-strings into strings when you try to
            // add them to a string
            System.out.println("avg: " + average + " over " + monthCount
                    + " months");
        } catch (IOException | ParseException e) {
            e.printStackTrace();
        }
    }
}

In a word, ugh. File processing scripts are not even slightly what Java is good at. Everything I needed for the Python script was part of the Python language. For the Java version, I had to go hunt down a library and add it to my project, which required knowing that there probably was a library, knowing how to add it to my build path, and figuring out how to use it.

Even the least terrible csv library I was able to find for Java inside of five minutes of googling (Apache Commons CSV, if you’re curious) was much harder to use as Python’s builtin csv handling. Java’s date parsing also requires way more steps than Python’s does. And to run this in Java you have to know about main methods and all the boilerplate around them. Even if you just let your IDE generate that for you, you still need to know it exists and what it’s for.

Basically you have to fight Java to do something like my average monthly spending script. You can still do it, but it’s much more work than it has to be. Java is great for big enterprisey systems with APIs and multiple programmers working on different pieces, but it’s kind of painful for little scripts to parse a csv and do some processing. Python, on the other hand, rocks at stuff like that. I hope this helps you understand what people actually mean when they say different languages are good for different things.

What the hell is using port 80?

Every time I need to figure out what process stole port 80 from my local server I have to look up the command again, so I’m going to share it here for my fellow windows users in hopes I’ll finally remember it :)

From a command shell running as admin:

netstat -anob

Thanks as usual to stackoverflow, where the collective memory of nerds lives.

And here’s another fun fact for windows users: Skype may well be the process that’s hogging port 80. It uses ports 80 and 443 by default because they’re usually not blocked by firewalls and hey, it’s not like developers use IM >:(

 

Shitty hackathon!

Hackathons (and game jams) can be a lot of fun but there can also be a lot of pressure to build something that actually works and is good. Enter the stupid hackathon! The idea of a stupid hackathon is that you deliberately make something ridiculous and/or terrible. Suddenly the pressure is off and you can try stuff that you don’t know will work. A friend of mine heard about it and shared the idea, then a few of us got together and had a little shitty hackathon.

I built a directions page using the Google Maps javascript api that sends you to a burrito place (or for tacos, we only have so many Mexican restaurants in Victoria) first before you actually get to your destination. It picks one at random, so sometimes it sends you to Esquimalt by way of McKenzie and Shelbourne. And sometimes it sends you to Taco Time so you can regret your life choices :)

Parts of the maps api are really easy to use, but other parts, not so much. Displaying directions on a map was straightforward, and so was adding waypoints between the user’s chosen start and end points. Getting enormously detailed information about a place was surprisingly easy too. Autocomplete, on the other hand, just wouldn’t work for me and I have no idea why. The great (not actually great) thing about javascript is how things can completely fail to work and not give you any sort of error message to work with.

To be fair, the maps api documentation does include a lot of examples to work from which is more than I can say about many other apis. If I ever finish adding autocomplete to my terrible directions page, I’d start with one of their autocomplete examples and add my directions code to it. Then if I wanted to get really fancy I could search for a burrito place on the way to your destination and add that to the route instead of randomly sending people across town. But then again, where’s the fun in that?

WordPress plugin of the day

A few weeks ago wordpress decided it didn’t feel like actually publishing my scheduled posts anymore. Technically I could’ve poured hours into figuring out exactly why wordpress was misbehaving but you know, part of being a senior dev is prioritizing :) Sometimes the five minute “install a plugin” fix is good enough. There are a bunch of plugins to fix the scheduled posts not actually posting issue, the one I chose is called WP Missed Schedule and it seems to be working well. If you have a wordpress blog that doesn’t always do what you told it to, give WP Missed Schedule a try.

 

Bridge design pattern

It’s design pattern time again! This time, let’s talk about the bridge design pattern. The bridge pattern is officially meant to “decouple an abstraction from its implementation so that the two can vary independently” which is just all kinds of helpful. The design patterns book has a lot of great ideas but they’re not always communicated especially clearly. That definition of the bridge pattern sounds an awful lot like the adapter pattern, which is meant to “convert the interface of a class into another interface clients expect. Adapter lets classes work together that couldn’t otherwise because of incompatible interfaces.”

First let’s talk about what the bridge pattern actually is and then we can get into how the bridge and adapter are different.

The way I would define the bridge pattern is it decouples multiple abstractions so they can both vary without making a huge mess. Maybe that’s clearer and maybe it’s not, so how about an example. I’m going to steal John Sonmez’s web app type and theme example from his post on the bridge pattern. Let’s imagine we have a web application framework that we can base different applications on, like a blog or a news site or a store. That’s one abstraction. Now let’s imagine we want to add themes. That’s another abstraction. When we first start building themes, it’s tempting to subclass our first abstraction, the web app type, for each theme. If we have two themes, say light and dark, we end up with six subclasses: blog-light, blog-dark, store-dark, store-light, news-light, and news-dark. That’s kind of a mess, and it’s only going to get worse when we add more themes and more app types.

What would be a lot cleaner is if we separated app type from theme so they can each vary without requiring an explosion of subclasses. If theme was separate from app type and each app type had a theme (composition over inheritance!), we could add all the themes we wanted without having to create any more app type subclasses.

Or to put it another way (ascii art diagram also by John Sonmez):

When:

        A
     /     \
    Aa      Ab
   / \     /  \
 Aa1 Aa2  Ab1 Ab2

Refactor to:

     A         N
  /     \     / \
Aa(N) Ab(N)  1   2

Instead of having one complicated hierarchy, sometimes it’s easier just to have two simple hierarchies.

Hopefully the bridge pattern makes sense now. On to the adapter!

The adapter pattern has a really simple real-world analog -it’s the object equivalent of the power plug adapter you use when you travel to a country with different wall sockets and you want to be able to plug your laptop in.

For a more codey example, imagine you have an application that notifies the person on call when a status check fails or something weird happens in the log. Depending on how urgent the event is the app either sends an email, a text, or a phone call. The code that decides a notification should be sent shouldn’t know about the details of sending texts vs making phone calls, it should be able to give the notifier class a recipient and a message and be done with it. The problem is that the libraries used to send texts and phone calls and emails all have different interfaces and that makes our code a mess. To clean it up, we use the adapter pattern (also known as a wrapper) to make the interfaces to each of those libraries look the same. That lets us use each library without having a big ugly if statement with slightly different method calls for each type of notification we want to send.

The adapter and bridge patterns are pretty closely related and it’s not unusual to need both of them. In the web app type and theme example above we didn’t get too far into implementation details, but if we wanted to add a theme created by someone else we might need an adapter to fit the new theme into our existing bridge pattern.

Debugging tip of the day

loglevel=”TRACE”

Alright, I guess I can give some details :)

It’s amazing how helpful just turning up your log level can be when you’re working on a weird bug. If something you can’t immediately explain is happening, try turning up your log level. In java, where I have the most experience, it’s unusual to run your production logging at a level above warn or debug. Normally you wouldn’t want extremely verbose logs, which are what you get when you turn up the log level, but sometimes you really need that extra information.

I wouldn’t normally think of turning up the log level, but we happened to have some trace level logging in our code and when I ran into a weird bug. I thought it would be easier to turn up the logging than to change all the .traces to .debugs, and it turned out the debug level exception gave me much less information than the trace level exception which had been swallowed because we were logging at the debug level. The trace level exception pointed me at the real bug, which turned out to be an obscure issue to do with my particular version of java having a weird interaction with a couple of libraries we’re using. Just because there’s no good reason for it to break doesn’t mean it won’t break :)

The moral of the story is that your log doesn’t necessarily tell you everything and you should turn up your log level until you get answers.