Monday, 11 June 2018

What I want from a code review tool

Many of my friends and colleagues will have heard me decry the state of code review tooling in the open source world. Here, I try to collect my thoughts on what I feel is missing; maybe it will help motivate some of the big players in that direction, maybe it will serve as a roadmap for my next side-project.

I'm going to assume that as a reader, you're bought into the general idea of code review: a place for people to get constructive feedback about a change they want to make (a "Pull Request") before that change is merged onto a target branch.

So what do I want from a code review tool?

As a code author, I want to easily see the status of my change. What needs to happen before I'm comfortable merging it? Some examples:

  1. Have my reviewers looked at the code yet?
    • If not, I probably don't want to merge my code (obviously, there are exceptions).
    • There is some nuance here - as a code author, am I just looking for someone to look over my code, just to sanity check it? Am I looking for Janet to review this code, because we had a discussion about the design, and she's best placed? Do I also want Sophie to also take a look, because she probably knows a better language construct I could be using? Whose review is sufficient?
    • There's more nuance that gets added from the reviewer side; I've yet to use a code review tool which has relationships between reviewers: "I'm ok with it if Sophie is". It's hard to express these concepts elegantly, but I suspect it can be done.
  2. Have any of my reviewers left comments which I haven't addressed? Which comments still need addressing?
    • My general rule of thumb is that all comments should either be addressed in code, or explained as to why they're not.
    • There is some nuance here - some comments are trivially addressable; some may need discussion, or I may try to address them, but not do a great job of it, and need follow-up. Ideally, the nuance of "I've left a comment, feel free to ignore it" vs "I've left a comment, I'm sure you'll address it fine" vs "I've left a comment, but would like to have another look before it's considered addressed" would be captured, in a low-friction way.
  3. If any tests have been run, have any failed? If so, I probably want to investigate that before I merge my code.
I've yet to use a code review tool in the open source world which actually meets these three criteria. Most shy away from modelling the social interactions and connotations of code review (things like "I'm ok with it if Sophie is"), and instead try to dump all of the information that could be relevant at you. Saying in one place "Janet has accepted, Sophie hasn't reviewed yet", and in another "Janet mentioned half-way through a paragraph that Sophie should probably take a look). Worse - many of them hide information they deem out of date. When nuance can only be expressed through text, it's great to have the text there for people to read, but wouldn't it be nice if a green tick meant "Good to merge", rather than maybe "Good to merge" or maybe "Good to merge if you make this change" or maybe "Good to merge but it would be great for Sophie to have a look too"?

As a code reviewer, I want to see what I need to do to unblock people.
  1. Has someone asked for my review, which I haven't provided?
  2. Has someone responded to one of my comments?
  3. Has someone acted on one of my comments where there's follow-up I should do?
Within the tool, there are two very distinct conversations to have; "Is this change generally sensible" and "Are the specifics of the code sensible?" - fortunately, the existing tools seem to cover this pretty well, with "change comments" and "line comments".

Now that the general concepts are covered, what specifics do I want in a code review tool?
  1. High quality code browsing.
    • Syntax highlighting.
    • Click-through and find-usages of symbols.
  2. High quality diffing.
    • Code diff chunks should include the name of the function, and type, the code is in.
    • Reviewing similar images? Show a diff!
  3. Context of wider change.
    • Is there a ticket/issue describing the larger-scale work that this change is part of? Link to it! Render some text from it!
    • Are there follow-up changes? Does it build on previous changes?
    • Does this change depend on other pending changes? Should they be merged atomically?
      • Ideally changes should be very small, but navigable with the right context to understand the bigger picture. Being able to change that level of granularity ("show me the change to this file in this pull request", "show me the whole change of the whole stack of dependent pull requests") would be nice, too.
  4. Previews.
    • Editing a markdown document? Render a preview, and render a diff!
    • Editing a web frontend? Host a preview of the site somewhere, and link to it.
  5. An elegant, at a glance view of what I need to do, both per-review, and across all reviews.
  6. Reviewers should be able to give code-edit suggestions.
    • Double-clicking on the code, fixing the typo, and presenting the code author with an "Accept suggestion" button is so much lower friction than saying "nit: It's just a typo, but there's an extra s here", and then requiring the code author to change branch, make an edit, commit, push, and say they've fixed it.
  7. Automated reviewers should be easy to plug in.
    • If I can write a tool which detects problems, I can save humans time.
    • Doubly so, if they can propose code-edit suggestions which can be accepted at a button-press.
    • There are a few classes of these; "always correct" suggestions vs "maybe a good idea" suggestions.
  8. Merging code by pressing a button in a web UI is nice.
    • Having an option to merge automatically when reviewers are happy is even nicer.
What don't I want in my code review tool?
  1. Anything to do with the specifics of a version control system.
    • Maybe the system ingests things via git push or ingesting a patch file, and maybe it can merge things onto a branch, but those should be the only times version control matters.
  2. Visual clutter.
    • Everything I want to know about a review should be quick and easy to discern at a glance. I should be able to dig in deeper when I need/want to.
  3. An overly simplified, or overly strict, model of review.
    • Different projects, teams, companies, and people, use different models for code review requirements, and code ownership. Code review tools all seem to either treat all reviewers as equal, or assume that each reviewer ticks a certain box (generally "owns directory X"). If what I want to say is "I'm happy with this code, but Sophie should check it over", or "I'm happy with this code, but here are some trivial things you should address first", or "Two people should review every change", I want that to be obvious.
There are also some crazy ideas that I'd love to see, but which sound hard:
  1. AST-aware formatting, diffing, and history.
    • It's crazy that people need to have style guides, and style wars, over when lines should wrap, or how much things should be indented. "Lines" of code is a weird concept. Statements and expressions are the building blocks of code, not lines!
  2. Semantically aware change decomposition.
    • You renamed a field, then extracted a function, then added a new call to the function. Wouldn't it be cool if your code review tool could tell your reviewer that's what you did, rather than them inferring it from reading?

Sunday, 3 June 2018

Rust minimum versions: SemVer is a lie!

A couple of months ago, cargo got a new feature I've been wanting for a while; an option to select the minimum, rather than maximum, compatible versions of each of your (transitive) dependencies, according to semver. So if you say you depend on foobar version 1.1, it will pick 1.1.0 not 1.1.16, as opposed to the standard behaviour of choosing 1.1.16. By traditional wisdom, this isn't something you actually want to use in real life, because newer versions tend to have optimisations / security fixes / ... which older versions don't. But it's a useful tool for checking whether your declared supported dependency versions are accurate.

This has particularly been in my mind recently because of Russ Cox's recent work on semantic versioning dependencies in Go. The tl;dr of that is: Avoid making breaking changes, libraries specify their minimum supported version of their dependencies, and the Go tooling will choose the minimum supported version of each dependency which works for the transitive set of dependencies. It does exactly the thing that traditional wisdom says is bad, but Russ argues pretty convincingly that this is good [1]. However, Russ's arguments rely on this being a community-wide adopted standard practice; if the community isn't all using the minimum supported version of things, it doesn't work, because nothing advances the base version of libraries, and everyone ends up stuck with super old versions of everything.

I thought I'd see how minimum versions worked out in the Rust ecosystem. I used my current main Rust project, the scheduling engine of the Pants build tool, as my playground. The project is about 50kloc, and has 134 transitive dependencies. Those dependencies include a handful of fairly common pure-rust libraries (clap, futures, tokio), a handful of C/C++ libraries (lmdb, grpcio, fuse), and a long tail of others.

I was expecting some things not to work, but I was mostly expecting to just need to bump some versions because libraries were using features introduced in newer minor versions than they claimed. I was surprised by some of the ways that things didn't work, and how hard it was to fix them...

Language stability

For the last couple of years, Rust has felt like a relatively stable, but quickly improving language. If you look through the "Compatibility Notes" and "Breaking Changes" section of the release notes since 1.0 (now 3 years ago), pretty much everything is a minor and niche issue; the language has been adding many things, but rarely breaking old ones.

There has been some discussion as to whether the minimum version of rustc you need to compile with should be considered part of a crate's API, and whether increasing that value should require a major version change. I was surprised, however, to find that the opposite was a common problem when pinning to minimum versions - the minimum supported semvers of some of my transitive dependencies are so low that they don't compile with recent (or for some, any post 1.0) versions of rust!

Some examples, with rust 1.25:
  • log is, surprisingly, a problematic crate. One of my dependencies specifies a '0.*' dependency on log, I guess because it doesn't want to be the most constraining dependency for something it doesn't care much about - the antithesis of the requirement for the whole community to bump minimums of Russ's proposals for Go. This means that I actually end up with all of log 0.4.1, log 0.3.1, and log 0.1.0, all in my application. But log 0.1.0 uses regex 0.1.0, which was released in 2014, pre rust 1.0, when Rust was a very different language. But it's not obvious what this crate should be doing. Technically, if you found the right compiler, log 0.1 would probably be fine for this library. In all practical terms, it doesn't work with log 0.1, because I suspect there isn't a version of the compiler that supports both regex 0.1.0 and the language features of my dependency. What version should it specify? Should it find whichever version of log's transitive dependencies built with rust 1.0, and specify that? Or rust 1.12? Or rust 1.20?
  • rand 0.3.13 (released January 2016) doesn't compile, because it depends on winapi 0.0.1, again released in 2014. I guess it was useful to find out that the protocol buffer compiler I'm using is using a two year old crate for making temporary files, which itself depends on rand 0.3.13 (and that there are much more commonly used crates for this purpose), but am I really going to send them a PR moving them over to use tempfile instead? Maybe... Maybe switching to actively maintained crates with stable versions is beneficial to the ecosystem. But that feels like it would be a surprising PR to receive, and certainly isn't the base expectation of the Rust community today.
  • All of my C/C++ dependencies end up depending on pkg-config 0.3.0, which doesn't work with any rust since 1.0.  But eeeeeverything does that! libz-sys, libgit2-sys, libssh2-sys, lzma-sys, libsqlite3-sys, libsodium-sys, ffmpeg-sys, libdbus-sys, glib-2-0-sys, and dozens of other widely used libraries maintained by people who know what they're doing, specify their pkg-config dependenciy as ^0.3. Are they all doing something wrong? Surely not...

Fixing these things is hard

I got my library compiling, eventually, but it took a lot of effort, and required forking several projects. Let's take a look at the log example. Ideally, I should somehow be able to say "I don't care what my dependencies say they need, give them log 0.4.1, that's the log I want to use". I found the [patch] section which looked like it did what I wanted, but it doesn't appear to cover transitive dependencies. Maybe there's something I missed, here?

So I started forking all of my dependencies which themselves had problematic dependencies. Ideally I can get PRs merged to update these things, but it's not obvious that "increase the minimum version you specify so that the minimum version actually compiles on modern Rust" is a reasonable pull request; partially because rust versions don't factor into versioning anywhere, and partially because minimum versions aren't expected to be used, even though they may technically be expected to work. Again, it feels like a weird PR to receive.

Does this matter?

Maybe this doesn't matter, in practical terms. Cargo prefers the most recent available version, and that mostly works for people. And maybe this whole problem is just a relic of pre-1.0 to post-1.0 transition, and it will go away at some point. But it seems strange that we bother to go through all of this writing down semvers of our dependencies, only for them to frequently be lies. Maybe Russ Cox is right, and you need to force people to keep them accurate by actually using the minimum versions. Maybe we should give up with specifying minimum versions at all, and just always use the most recent version (relying on Cargo.lock files for reproducibility). Maybe now that we have -Z minimal-versions, it will be easy to add these checks to people's CI (or even on the crate publishing path), and we can enforce that what we write down is accurate. Maybe this is yet another reason we should be factoring "supported rust versions" into dependency resolution (either by including it in semver, or by tracking it separately in Cargo.toml files). I don't know what we should do as a community, but right now, things feel a little weird.





1: Russ Cox's series of blog posts is very interesting, and well written, and I recommend giving it a read if you're interested in this kind of thing. I have mixed opinions on the design in general, but that's a conversation for a different day!