melreams.com

Nerrrrd

What does SOLID really mean? Part 5

First, a quick recap: the SOLID principles of object oriented design are, to quote Uncle Bob:

The Single Responsibility Principle A class should have one, and only one, reason to change.
The Open Closed Principle You should be able to extend a classes behavior, without modifying it.
The Liskov Substitution Principle Derived classes must be substitutable for their base classes.
The Interface Segregation Principle Make fine grained interfaces that are client specific.
The Dependency Inversion Principle Depend on abstractions, not on concretions.

Last time I talked about the fourth letter in SOLID, the Interface Segregation Principle. Now I’m moving on to the Dependency Inversion Principle.

The Dependency Inversion Principle says you should depend on abstractions, not concrete classes. Great, what does that mean? Basically that you want to hide the details of what you’re doing not just behind a separate class but behind an interface so you don’t even have to know which class is actually doing the work.

If you only have one class that actually does the work, this probably seems like a total waste of time. Honestly, for some situations it probably is. If there are business reasons that you’re never, ever going to change database vendors, then don’t worry too much about hiding which database driver you’re using. In other situations where things might change or will definitely change (which is most situations, if requirements would just stay put software would be easy), dependency inversion can really help you out.

Unrelated photo from Pexels
Unrelated photo from Pexels

Let’s take sending email as an example. In a web app that you sell to other businesses, you often need to notify their customers of things directly – if you sell an appointment reminder system the entire point is that your customer doesn’t have to manually send emails to their customers, your app takes care of that for them. Sending email sounds simple enough, right? Either you set up your own SMTP server and send emails directly or you use a service like MailChimp or SendGrid or Amazon SES or Mailgun or ___ and you leave it alone.

Not so fast! What if some of your customers want to send email through their own SendGrid account so they can customize their own emails without going through you and see all their stats and everything? What if other customers already have their own SMTP server and want to send email through that? Now you’ve really got to hide all the details so that your code can trigger an email without even knowing whether that email is going to be send directly to a mailserver or to a service like Mailgun.

If you built your app following the dependency inversion principle from the get-to, this is going to be really simple. All you have to do is add another implementation of the email handler interface you already have and you’re set. Best of all, you know you didn’t break your existing email handling because it’s in a separate class that you don’t have to mess with.

If you let your app depend directly on one email service, though, you’ve got a mess to deal with. Not only do you have to add another email handler, but you have to make pretty major changes to your code to pull your existing email handling into a separate class. This can really suck if you let your code deal with too many implementation details, like how to react to different error codes. It also makes the change riskier and more expensive (in both time and money) because any time you change existing code you might introduce new bugs and because you’ll need to retest all of the existing email handling as well as the new feature to make sure everything still works.

Even if you doubt a certain feature is going to change, it’s still worth thinking about dependency inversion. If the code that triggers an email can only talk to an interface, that’s going to change the way you pass along data like the to and from addresses. It’s also going to change how you report and recover from errors. You might still decide to let your code depend directly on your email service, which is perfectly fine if you’ve thought that decision through. The Dependency Inversion Principle isn’t mean to be an ironclad rule, it’s just a principle to help you avoid painting yourself into a corner.

That’s it for SOLID! If there’s a particular design principle you’d like me to cover next, let me know in the comments.

What does SOLID really mean? Part 4

First, a quick recap: the SOLID principles of object oriented design are, to quote Uncle Bob:

The Single Responsibility Principle A class should have one, and only one, reason to change.
The Open Closed Principle You should be able to extend a classes behavior, without modifying it.
The Liskov Substitution Principle Derived classes must be substitutable for their base classes.
The Interface Segregation Principle Make fine grained interfaces that are client specific.
The Dependency Inversion Principle Depend on abstractions, not on concretions.

Last time I talked about the third letter in SOLID, the Liskov Substitution Principle. Now I’m moving on the the Interface Segregation Principle.

Another way to state the Interface Segregation Principle is that no client should be forced to depend on methods it does not use (thanks wikipedia). That is, if you have methods in your interface that are different enough that no single client would use both of them, those methods probably belong in separate interfaces.This is similar to but not quite the same as the Single Responsibility Principle – a class can have a single responsibility and still have public methods that will be used by some clients but not others.

Take this example of a job class from the wikipedia page on the Interface Segregation Principle.

The ISP was first used and formulated by Robert C. Martin while consulting for Xerox. Xerox had created a new printer system that could perform a variety of tasks such as stapling and faxing. The software for this system was created from the ground up. As the software grew, making modifications became more and more difficult so that even the smallest change would take a redeployment cycle of an hour, which made development nearly impossible.

The design problem was that a single Job class was used by almost all of the tasks. Whenever a print job or a stapling job needed to be performed, a call was made to the Job class. This resulted in a ‘fat’ class with multitudes of methods specific to a variety of different clients. Because of this design, a staple job would know about all the methods of the print job, even though there was no use for them.

The solution suggested by Martin utilized what is called the Interface Segregation Principle today. Applied to the Xerox software, an interface layer between the Job class and its clients was added using the Dependency Inversion Principle. Instead of having one large Job class, a Staple Job interface or a Print Job interface was created that would be used by the Staple or Print classes, respectively, calling methods of the Job class. Therefore, one interface was created for each job type, which were all implemented by the Job class.

Just because the Job class only changes when when we have a new or different type of job doesn’t mean the interface isn’t a mess. Of course, you could also argue that “Job” is too broad and that the Job class does have multiple responsibilities because a a staple job and a print job are separate things, but I think there’s still something to be gained from looking at the breadth of your interface and thinking about whether it needs to be broken up into separate interfaces.

Even if you have a single class that implements all of those interfaces, it’s still cleaner for the clients of that class only to know about the methods they actually need. The more things your interface does, the more likely that separate clients accidentally get tangled up because it’s so easy to just call another method on an interface you already have access to. Splitting your interface into separate pieces forces you to think about what each client really needs to have access to and whether you’ve split your clients up the right way.

In most cases, it’s probably better to let separate subclasses implement the different parts of each single purpose interface. If two clients are different enough to use completely separate interfaces, then a single change probably should not affect them both. Sometimes the change you need to make is at such a fundamental level that it is reasonable for all clients to be affected, but that’s something you should avoid if at all possible. Programming: where there’s never a simple right answer.

Another reason to have smaller interfaces is to make life easier for maintenance programmers :) The more methods you have in an interface, the harder the maintenance programmer has to work to figure out which one is actually right for what they’re doing. That might sound silly, but take a look at the Java Collections API. Collections are meant to be generic so they do need a pretty broad interface but that’s still a lot of stuff to dig through when you just want to know which method to use to update some of the elements in your collection.

Next up, the last letter in SOLID: the Dependency Inversion Principle.

What does SOLID really mean? Part 3

First, a quick recap: the SOLID principles of object oriented design are, to quote Uncle Bob:

The Single Responsibility Principle A class should have one, and only one, reason to change.
The Open Closed Principle You should be able to extend a classes behavior, without modifying it.
The Liskov Substitution Principle Derived classes must be substitutable for their base classes.
The Interface Segregation Principle Make fine grained interfaces that are client specific.
The Dependency Inversion Principle Depend on abstractions, not on concretions.

Last time I talked about the second letter in SOLID, the Open Closed Principle. Now I’m moving on to the Liskov Substitution principle.

The Liskov Substitution Principle is named that because it was created by Barbara Liskov, currently an institute professor at the Massachusetts Institute of Technology and Ford Professor of Engineering in its School of Engineering‘s electrical engineering and computer science department. The principle also doesn’t lend itself well to a short description, so it’s just easier to name it after the person who invented it.

The Liskov substitution principle says that it must be possible to substitute a subclass for the base class without changing anything. Basically, your object hierarchy

Unrelated photo by Zukiman Mohamad
Unrelated photo by Zukiman Mohamad

should make sense :) If you have a subclass that requires special handling and can’t just be dropped in where the base class is used, something is wrong with your design. Why is that so bad? Because it means every time you use that subclass you have to remember to add the special handling bit and/or remember which subclass has which side effects.

If you need some of the functionality of the base class but you have to do some special stuff that means you can’t just create a subclass that is substitutable, create a new class and give it an instance of the base class to use. If it can’t behave like a real subclass, don’t try to force it to be one, it’s just going to cause trouble in the long run.

The typical example of a Liskov Substitution Principle violation is a Square class and a Rectangle class. If they both have setters for width and height, then you can get yourself into trouble if the calling code got a rectangle when it expected a square or vice versa. Say you’re trying to lay out a screen and you know you have a space left that’s x by y so you set your screen object’s width to x and its height to y. If your screen object is a rectangle, everything is cool. But if your object is a square, suddenly its width also got set to y when you set its height to y. Now your layout is all messed up and you’re frustrated because your code looks perfectly reasonable even though it’s clearly not working correctly.

Another way to state the Liskov Substitution Principle is that your code shouldn’t contain surprises. No matter how sensible and obvious something seems while you’re writing it, in six months when you come back to add a new feature you will have forgotten all the details. If your code doesn’t have surprise side effects or special handling, then you’re much more likely to be able to add that new feature quickly and move on. If you run into a surprise, you could spend ages figuring out why the code behaves that way.

If you have class hierarchies in your code, be nice to your future self and obey the Liskov Substitution Principle.

What does SOLID really mean? Part 2

First, a quick recap: the SOLID principles of object oriented design are, to quote Uncle Bob:

The Single Responsibility Principle A class should have one, and only one, reason to change.
The Open Closed Principle You should be able to extend a classes behavior, without modifying it.
The Liskov Substitution Principle Derived classes must be substitutable for their base classes.
The Interface Segregation Principle Make fine grained interfaces that are client specific.
The Dependency Inversion Principle Depend on abstractions, not on concretions.

Last time I talked about the first letter in SOLID, the Single Responsibility Principle. Now I’m moving on to the Open Closed Principle.

The Open Closed principle says that you should be able to extend a class’s behaviour without modifying that class directly. In other words, the class should be open for extension and closed for modification. Okay cool, but what does that really mean? That you should design your classes in a way that when you need to add new features to your system you can do it by adding new code without messing with existing code that already works. Remember, most of software development is accepting that it’s hard and trying not to completely screw it up. When you change code that already works, you risk breaking everything that depends on it. It’s safer to add new code to a child class (for example) that’s separate from the existing code so you can break your new feature without wrecking everyone else’s day.

Photo so this blog post looks nicer in social media shares. Photo by CreateHER Stock shared under a Creative Commons Attribution-ShareAlike 4.0 International License.
Photo so this blog post looks nicer in social media shares. Photo by CreateHER Stock shared under a Creative Commons Attribution-ShareAlike 4.0 International License.

Open/closed is about how you arrange your abstractions. The name Open Closed Principle can be kind of confusing, it’s not about somehow preventing other people from changing existing code with stern comments threatening to replace their good chair with the crappy broken one that got abandoned in the conference room, it’s about writing your code so other people (or you in six months) don’t have to change the existing code. It’s the not needing to change your code part that makes your class closed to modification.

For example, suppose you have some classes for employees and contractors, and a report building class that calculates everyone’s pay for the month. If that report building class has an if or case statement that checks whether the current person to calculate pay for is an employee or a contractor and handles them differently, then your class has to be updated every time you add a new type. Maybe you need to handle interns now, or both salaried and hourly employees, or full and part time employees, or sales people who get paid different commissions, or or or. The more employee types you add the bigger and uglier your case statement gets and the more chance you’ll forget one or mess it up.

Instead, you need an interface named PayableIndividual with a method called calculatePay(Date start, Date end). Then your concrete classes like FullTimeEmployee and Contractor can implement that interface. If your report generating class only uses PayableIndividuals, not the concrete classes that implement that interface, you an add all the subtypes you want without ever having to mess with the report generator because all it has to do is call calculatePay and let the concrete class do the work.

The open for extension part of the Open Closed Principle is about keeping processing that’s specific to an individual piece of code separate. It might be tempting to make that calculatePay method I mentioned above a method in an abstract PayableIndividual class and let the subclasses just add what they need to after they call the base calculatePay method. If you really do have a base hourly rate everyone gets paid that could work, but if, say, some contractors get paid by the hour and some sell blocks of work for a fixed price then you’ll have to tear out the base calculatePay method that doesn’t make sense anymore. If it can vary, separate it out so that you can simply override it in a base class without reworking your entire design.

Of course, you can’t have code that perfectly adheres to the open closed principle if it actually does anything useful. Sooner or later you’ll run into a problem that just doesn’t bit perfectly into your nice tidy design. For example, you might want your salary report ordered by employee type and then name. Whatever code does the ordering has to change when you add new employee types – all you can do is keep the mess as contained as possible. If you use a comparator with a table that contains the order all the employee types should be printed in then only that table has to change when you add a new employee type. Not perfect but it could be much worse.

So why is it so bad to modify a base class? It’s a risk. Every time you modify a class, you risk breaking everything that depends on it. That’s bad enough if it’s a class used only by one project, but it’s really scary if it’s a class in a shared library. If you could break every project that uses that library, then every project needs to be retested and that can be massively expensive in terms of both time and money. It can also make other developers hate you, which is exactly what we’re trying to avoid with these principles :)

Next up, the Liskov Substitution principle!

What does SOLID really mean? Part 1

Not so long ago I read an article about SOLID design principles in Clojure and started thinking it would be interesting to talk about those principles more generally. I don’t know about you, but I have a terrible habit of skimming over stuff like that thinking “oh sure, SOLID, that sounds like a good idea” and then promptly forgetting all about it.

According to Wikipedia, Robert C Martin (aka Uncle Bob) came up with the SOLID design principles in the early 2000s and Michael Feathers came up with the mnemonic acronym to help people remember them. The SOLID principles are meant to help people design code that’s easy to maintain and extend, which keeps future maintenance programmers from wanting to throw chairs at them :)

Uncle Bob has an excellent summary of what SOLID stands for on his site, so I’ll just quote him here:

The Single Responsibility Principle A class should have one, and only one, reason to change.
The Open Closed Principle You should be able to extend a classes behavior, without modifying it.
The Liskov Substitution Principle Derived classes must be substitutable for their base classes.
The Interface Segregation Principle Make fine grained interfaces that are client specific.
The Dependency Inversion Principle Depend on abstractions, not on concretions.

Let’s start with the S, the single responsibility principle. Why is it so important that a class only have one reason to change? Because that means it’s only responsible for one thing. Classes that are responsible for only one thing have conceptual integrity, which is an enormous part of writing code than anyone else can ever use and one of the most important things The Mythical Man Month talks about. Conceptual integrity is kind of a tough concept to nail down, though. I would say that a project that has a high level of conceptual integrity is consistent, it has a predictable design. All of the classes at any given layer of your architecture should behave largely the same way so another programmer doesn’t have to spend hours figuring out how each individual class behaves so she can add her feature.

A surprising number of best practices in software development are about accepting that development is incredibly easy to screw up and attempting to make it harder to make a complete hash of things. If you’re working on a very small project like an assignment in college, it really doesn’t matter whether your project has conceptual integrity. If it’s simple enough that you can hold the whole thing in your head or if it’s something you spend a week building before you hand it in and then never look at it again, you can get away with whatever ridiculous mish-mash of design concepts you want. Where conceptual integrity becomes a huge fucking deal is when you’re maintaining a large system over multiple years.

College/university/bootcamps are great and you’ll learn lots, but one thing that’s very difficult to teach in a limited amount of time is what it’s actually like to maintain an existing system. All of the things you can get away with in tiny throwaway projects just do not work on larger projects that might survive for ten years or more. Every tiny mistake that you make in the beginning gets magnified by years and years of decisions built on top of it.

For example, at a previous job I chose a convoluted hash structure to store the data I needed to build an interface. It seemed like a good idea when I built it, but as the requirements changed – requirements always change, don’t kid yourself on that front – the cracks started to show. I had come up with a brittle design that was extremely difficult to extend. As I tried to force my data structure to support more and more features it just became more and more of a mess. I ended up having to apologize a whole lot to the dev who ended up maintaining that monstrosity after I left and I wouldn’t have been surprised if he had to scrap the whole thing and rebuild it sensibly.

Even if conceptual integrity doesn’t seem like a big deal at this point in your studies or career, trust me, it will come back to bite you soon enough. The more things your class does the longer it takes another dev to figure out how to use it. That other dev could also be you six months from now when you’ve forgotten all the fiddly little details of how you built that class. The less predictable your codebase is, the more time you waste re-learning things every time you use a given class.

It probably doesn’t sound that bad to have to relearn one class before you use it. It only takes a few minutes, right? Where things get ugly is where you have to relearn “just one class” five times to get a feature done or even worse, when you didn’t realize you needed to relearn that class because the whole rest of that layer worked the same way so you reasonably assumed that class did too. You get some really interesting bugs when just one of these things isn’t like the others and what’s worse, they’re especially difficult to catch with unit tests because hardly anyone thinks to test for the possibility that this one class doesn’t behave like all of the other classes like it.

The more areas a class has responsibility for, the less predictable it is. If I have a service that retrieves data from a store (maybe it’s a relational db, maybe it’s a nosql db or a cache), I can pretty safely make assumptions about what the get and save methods do. But if I have a service that retrieves data and formats it for display, I can’t know it formats things the same way as all the other controller classes or if it does any special business logic until I go and look.

The more areas a class has responsibility for, the more other classes have to change if it changes. A class that just represents database data only changes if we add new fields to the database. If we add a new field, it’s completely logical and predictable that other classes that use that class might need to change to handle the new field. If a class handles retrieval and some business logic, then it’s much harder to find all of the classes that need to change if that class changes and it’s much more likely that we might miss something that needs to change too. It sounds stupid if you’re used to teeny tiny projects, but this is stuff that actually happens when you have a codebase of over a thousand classes. The more your project does, the more careful you have to be about keeping everything tidy. Think about that poor maintenance programmer and don’t make her wish she could hunt you down and pelt you with tomatoes.

As much as we like to think programming is a solitary activity, a huge amount of professional programming is actually about not being a complete asshole to the devs who will come after you :)

The open/closed principle is coming soon!

Java best practices: synchronization

Java’s synchronization can be really helpful, but it can also get you into plenty of trouble. Synchonization is in no way a magic wand that you can wave around to get rid of multi-threading issues, you have to understand how to use it.

In java (and many other languages, but java’s what I’m familiar with), synchronization prevents threads from accessing the same data at the same time. Concurrency (multiple threads sharing access to the same variables) is a gigantic subject, so I’m going to gloss over it here by saying that things can go wrong in deeply weird ways when threads accidentally overwrite each other’s updates to a variable or work from different copies of the shared variable. Synchronization can stop that from happening if you use it correctly, but at the cost of a hit to performance and the need to be very very careful that you don’t introduce deadlocks.

public synchronized void example() {
   //do things
}

Using the synchronization keyword on a method (like in the example above) synchronizes access to that entire method, which is generally pretty safe but unless you have a very small method you may be synchronizing a bigger chunk of code than you absolutely need to, which is more of a performance hit than necessary. Because synchronized blocks/methods can only be accessed by one thread at a time, they really slow down processing. The larger a chunk of code you synchronize, the worse the performance hit is.

class Example {
   Message m;

   public Example(Message m) {
       this.m = m;
   }

   public void doThings() {
       String name = Thread.currentThread().getName();
       synchronized(m) {
           //actually do things with m
       }
   }
} 

The synchronization method, while it makes it easier to synchronize only the part you need, also makes it easier to mess things up by introducing a deadlock. A deadlock happens if thread A needs locks on objects Y and Z and thread B needs locks on objects Z and Y in that order. If A locks Y and waits for Z to be unlocked, and B locks Z and waits for Y to be unlocked, both threads wait forever and nothing happens until you restart your program. If you lock on multiple objects (which you should definitely do if you need to update multiple shared objects in the same block of code), make sure that you absolutely always lock on those objects in the same order. The same problem applies to mysql deadlocks, which can really suck to debug if your codebase is large enough.

While we’re at it, according to stack overflow, synchronized(this) can be dangerous because it synchronizes on the entire instance. If you have another block that synchronizes on this, it can’t run until the other lock on this unlocks. It also means any external locks on that object can’t run until it’s unlocked, which can cause serious performance problems if you do it enough.

Aside from being very careful when you do use synchronized, the best advice I can give you is to use it as little as possible. If you can, just don’t have shared state. Particularly in web programming, you generally shouldn’t keep state around for longer than it takes to process a request.

Finally, if you use synchronized and mess it up, don’t waste time beating yourself up about it. Concurrency is even worse than timezones and everyone messes it up sometimes.

Different languages are good for different things

As you learn to code and learn new programming languages you’ll often hear that different languages are good for different things. Technically you can do just about anything in any language, so for a long time that never meant much to me. Once you get past basic conditionals and loops, there actually are pretty major differences in how easy it is to do different things in different languages.

Here’s a handy example: the other day I wanted to figure out how much I spend on average each month so I could figure out how much I can reliably throw into my RRSP. Okay, use mint, you say. Not so fast there! I only wanted to know about my expenses NOT including RRSP and TFSA contributions, and I wanted to leave out the month I got married because it’s a huge outlier and screws up my average :) If you can get mint to do that, I’d love to hear how.

What I ended up doing was downloading my transaction history as a csv from my bank and manually removing the stuff I didn’t want to include. Then I needed to create monthly totals (so I could see if those looked reasonable) and an overall average somehow. I was hoping I could do that with a simple formula in a spreadsheet, but after fiddling with it for a bit I decided I’d rather poke myself in the eye than stick with that idea.

Python to the rescue! Not so long ago I was a mentor at a Ladies Learning Code workshop about data processing with Python. At the end of that workshop we ended up with a little script that read in a csv, did some processing, and output the results, which is exactly what I needed. I started with that script and ended up with this:

# Import the csv library
import csv
import datetime

# Open the statement file
statement_file = open('./statement.csv')

# Convert it to a csv_data structure
statement_data = csv.DictReader(statement_file)
current_month = -1
current_year = -1
months = 0
grand_total = 0.0
running_total = 0
# Loop through each of the rows
for transaction in statement_data:
    # deposits have a blank in the withdrawal field, we only want withdrawals
    if transaction['withdrawal'] is not '':
        #convert the string date to a date object so we can get the month
        date = datetime.datetime.strptime(transaction["date"], '%d-%b-%Y')
        #every time we hit a row where the month doesn't match the month from
        #the last row we know it's a new month and we need to update current
        # month & year and increment the month count
        if date.month != current_month:
            if current_month > -1:
                months += 1
                #print current_month instead of date.month because date.month
                #is the new month
                print(str(current_month) + "-" + str(current_year)
                      + " monthly total: " + str(running_total))
            current_month = date.month
            current_year = date.year
            running_total = 0
        running_total += float(transaction["withdrawal"])
        grand_total += float(transaction["withdrawal"])

# one more print statement for the last month in the file
print(str(current_month) + "-" + str(current_year) + " monthly total: "
      + str(running_total))

average = grand_total / months
print("avg: " + str(average) + " over " + str(months) + " months")

Then I started thinking, that was weirdly easy considering that since college I’ve touched Python twice – once while preparing for that Ladies Learning Code workshop and once while actually mentoring at the workshop. That made me wonder how Java, the language I’ve used just about every day at work for the last nine years, would compare. So I ported my Python script to Java and this is what I ended up with:

import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVRecord;

public class Calc {
    public static void main(String[] args) {
        try {
            int monthCount = 0;
            int currentMonth = -1;
            int currentYear = -1;
            float grandTotal = 0;
            float runningTotal = 0;
            // open the statement csv
            Reader in = new FileReader("statement.csv");
            // parse it into CSVRecords so we can get values out more easily
            // unlike python this CSV library doesn't seem to automagically
            // figure out what a header row is so I had to add the headers
            // manually
            Iterable<CSVRecord> records = CSVFormat.DEFAULT.withHeader(
                    "account", "date", "desc", "num", "withdrawal", "deposit",
                    "balance").parse(in);
            // loop through each of the rows
            for (CSVRecord record : records) {
                String dateStr = record.get("date");
                String withdrawalStr = record.get("withdrawal");
                // deposits have a blank in the withdrawal field, we only want
                // withdrawals
                if (withdrawalStr != null && !withdrawalStr.equals("")) {
                    // java requires a lot of boilerplate around parsing a
                    // string into a date that we can get a month out of
                    DateFormat df = new SimpleDateFormat("d-MMM-yyyy");
                    Date transactionDate = df.parse(dateStr);
                    Calendar cal = Calendar.getInstance();
                    cal.setTime(transactionDate);
                    // every time we hit a row where the month doesn't match 
                    // the month from the last row we know it's a new month 
                    // and we need to update the current month and increment 
                    // the month count. technically we can get the month 
                    // using transactionDate.getMonth() but that method is 
                    // deprecated and I'm trying to set a good example
                    if (cal.get(Calendar.MONTH) != currentMonth) {
                        monthCount++;
                        if (currentMonth > -1) {
                            // in java months start from 0, add 1 so we get
                            // nicer looking output
                            System.out.println((currentMonth + 1) + "-"
                                    + currentYear + " monthly total: "
                                    + runningTotal);
                        }
                        currentMonth = cal.get(Calendar.MONTH);
                        currentYear = cal.get(Calendar.YEAR);
                        runningTotal = 0;
                    }

                    float withdrawal = Float.parseFloat(withdrawalStr);
                    grandTotal += withdrawal;
                    runningTotal += withdrawal;
                }
            }
            // one more print statement for the last month in the file
            System.out.println((currentMonth + 1) + "-" + currentYear
                    + " monthly total: " + runningTotal);
            float average = grandTotal / monthCount;
            // the one convenient thing java does here is 'autoboxing' - it
            // automatically converts non-strings into strings when you try to
            // add them to a string
            System.out.println("avg: " + average + " over " + monthCount
                    + " months");
        } catch (IOException | ParseException e) {
            e.printStackTrace();
        }
    }
}

In a word, ugh. File processing scripts are not even slightly what Java is good at. Everything I needed for the Python script was part of the Python language. For the Java version, I had to go hunt down a library and add it to my project, which required knowing that there probably was a library, knowing how to add it to my build path, and figuring out how to use it.

Even the least terrible csv library I was able to find for Java inside of five minutes of googling (Apache Commons CSV, if you’re curious) was much harder to use as Python’s builtin csv handling. Java’s date parsing also requires way more steps than Python’s does. And to run this in Java you have to know about main methods and all the boilerplate around them. Even if you just let your IDE generate that for you, you still need to know it exists and what it’s for.

Basically you have to fight Java to do something like my average monthly spending script. You can still do it, but it’s much more work than it has to be. Java is great for big enterprisey systems with APIs and multiple programmers working on different pieces, but it’s kind of painful for little scripts to parse a csv and do some processing. Python, on the other hand, rocks at stuff like that. I hope this helps you understand what people actually mean when they say different languages are good for different things.