Diffs in Unit Tests

While working on my refactorable settings code extension, I needed to verify that the output of the tool matched some sample output from the original settings code generation tool. Unfortunately, Assert.AreEqual on two really long strings of generated code doesn't give you a whole lot of helpful information about exactly which characters don't match (especially if non-printing characters are the problem). A little googling turned up an amazingly useful solution from Phil Haacked - a set of classes which display a character-by-character diff of the two strings.

I actually ended up using this slightly modified version by Sina Iravanian. If no one else has put this into a nuget package by the next time I need it, I'm going to have to do it myself; it's too incredibly useful to only be available for cut-and-paste.

Code Reviews Should Produce Automated Tests

The other day I was reading an article about code reviews that I saw on reddit and it got me to thinking about some of the problems I've seen with code reviews over the years.

The first is issues that both the reviewer and reviewee agree on, but which never get fixed. Basically, during the review the reviewer points out an issue, and the reviewee says, "Yep, that's a problem. I'll fix that." Then the reviewee gets caught up in the usual day-to-day distractions, and the issue never gets addressed. This problem shows up more in face-to-face reviews; asynchronous processes that use code review tools (especially ones where the reviewer has to sign off on check-ins) are better about this sort of thing. However, in both cases a reintroduction of the issue into the same part of the codebase later can slip by (unless you do comprehensive code reviews, and not the spot-check sorts of reviews that most of us have time for).

The second is communication of arguments - the reviewer and reviewee may not agree that something is an issue, and have a communication barrier that makes hashing out the issue difficult. This problem has a couple of sub-genres:

  1. Theoretical performance issues - the reviewer is pretty sure there's a potential performance problem, and the reviewee doesn't agree. These often end up in whiteboard arguments where you each try to use the bits of big-O notation you sort of remember from CS233. They may or may not get resolved, and the resolution might be incorrect.

  2. The reviewer and reviewee don't realize that they're not discussing the same issue. This one is a problem with using English (or any other human language) to talk about technical processes. Problems with vocabulary, semantics, and possibly the fact that it's not someone's native language can make this tough.

These types of communication issues are time consuming, but not necessarily dangerous. Since engineers generally want resolutions to arguments, eventually things will get worked out.

The third, and potentially most dangerous, is the situation where there's a communication problem that's not apparent. The reviewer has pointed out a problem, and the reviewee agrees that it needs to be addressed. But the reviewee's understanding of the issue is either incomplete or wrong. Now everyone is under the false impression that the review has successfully addressed a problem that's still lurking in the code.

So what if, rather than trying to communicate issues in a code review with human languages, we did it in a programming language? In other words, what if instead of producing comments, our code reviews produced automated tests?

Instead of pointing out that a colleague's method isn't validating its arguments properly (or adding a comment to that effect in the code review tool), I could simply write a unit test that demonstrates the incorrect behavior. Instead of pointing out an edge case that he missed, I could write a unit test that sets up that edge case and tests it. If I think that my colleagues data access strategy isn't going to scale, I could write an acceptance test that shows what happens when the table has a production-level number of records (and that his data access strategy doesn't fall within acceptable response times).

This is probably not an original idea; I haven't seen it done anywhere, but I have to assume someone else in the great wide world has come up with it before. If they have, and there's a name for it, I'd love to hear about it. I haven't tested this idea yet; I'm going to attempt it next time I'm doing a code review. I'm enthusiastic about the idea, because I think it has a lot of possible benefits:

  1. The code review output is useful. It increases your test coverage, and keeps the same issues from coming up again later.

  2. It's "provable". It's an ironclad argument that the issue you're raising really is an issue. Conversely, if you can't write a failing test for it, then the issue probably wasn't really an issue, and your colleague needn't waste time addressing it.

  3. It's clearly communicated. By encoding the concern in a programming language, you avoid many of the possible communication issues between two humans.

  4. You can definitively settle theoretical performance arguments. Instead of whiteboard arguments and incorrect assumptions about scaling, you end up with a repeatable acceptance test.

  5. You guarantee that the issue will actually be addressed. Too many code reviews (especially face-to-face ones) end with "Okay, I'll fix that", followed by the usual parade of distractions which mean that the issue is promptly forgotten. A failing test is a good (and very difficult to ignore) reminder.

Now, I can predict a couple of concerns that people might have with this:

For one, it pretty much has to be asynchronous. If face-to-face code reviews are the most effective process at your organization, then this might not work for you.

Honestly, I hate face-to-face code reviews. They're awkward, and often the discussion is lower quality because any arguments have to be made on the fly; there's no time to really think carefully about why a particular method or algorithm or name is or is not optimal. And unless at least one person at the review is a good note taker (or you record the whole session), chances are some of the discussion is lost down the memory hole by the time everyone is back in their office coding again. So I much prefer asynchronous code review tools like Crucible. You have a record of everything that happens in the code review, you have time to think, and you don't actually have to be in the room with another human. Even if you don't adopt the test-writing review, I'd strongly urge you to give asynchronous reviews a try. Heck, there's no reason that you couldn't supplement your face-to-face reviews with test-writing reviews.

Another object that might come up: this doesn't cover things like coding style issues or naming conventions. That's true. Again, you don't have to do this exclusively - combine it with other code review processes. Also, I'd point out that coding style/naming convention reviews can essentially be done with software (e.g., StyleCop). If you're using developer time for this, you may want to think about supplementing the reviews with an automated tool.

Like any other software development process, there is no "one true way" that will work for everyone. This is just meant as a suggestion, something to try in addition to whatever you already do. If it works for you, great - I'd love to hear about it. And if it doesn't, I'd love to hear about that, too. I'll definitely be giving it a shot in my next review.