Adventures with Files.lines()

We realised we had a problem when our performance testing environment locked up.

Our deployment system wasn’t responding to input, and when we looked at it we had a bunch of exceptions: turns out we were running out of file handles. We’d seen similar situations before when the file handle count was unnecessarily low, but that wasn’t the case here. Fortunately, we re-deploy and run a fresh perf test roughly every 45 minutes, so that narrowed the range of possible commits down significantly. It didn’t take us long to discover the issue was in the new system monitoring service.

The system monitoring service was incredibly lightweight. It didn’t need to be in Java, but there were advantages with it having as much as possible in common with the rest of our services. All it did was read various system metrics from the /proc filesystem and propagate them to our monitoring database, giving updates every second on things like CPU usage and network errors.

We were using the fancy new Java 8 nio stream methods. These made it much easier to compose some simple filtering and pattern matching to extract the data we wanted – most of the key methods were single-liners, taking a stream and filtering then mapping it with references to nice, simple methods, which each did one small thing in isolation. It was easy code, both to write and read. We’d only started using these methods recently and were totally converted by how much nicer they were to deal with than any of the pre-Java 8 alternatives.

What we didn’t realise, until this moment, was that calling Files.lines() multiple times a second (as there were multiple different sensors) is a recipe for a bad time if you didn’t know that you need to close the stream.

On the one hand, this makes sense: how are we supposed to do file operations if we don’t open a file descriptor? On the other hand, it’s frustrating on two counts:

Firstly, the stream knows when we’re done with it: when we call a terminal operation. Up until that point, we can’t get anything out of the stream. After that point, we can’t do anything more with the stream. It would make sense for the stream to open the file descriptor as the first operation in the terminal operation, and then close its file descriptor just before it returns.

There are good reasons why this doesn’t happen in the general case: see Brian Goetz’s response to this StackOverflow question for the rationale. It’s worth noting that this doesn’t apply to the stream returned by Files.lines(), as it creates and encapsulates its resources.

Secondly, it’s easily fixed but doing so makes methods a whole lot worse. The easiest way to fix it is to turn this:

public List method(Path path)
    return Files.lines(path)

into this:

public List method(Path path)
    try (Stream lines = Files.lines(path))
        return lines.filter(this::interestingLines)

It’s no longer a one-liner. More importantly it’s gone from dealing with one domain (processing files) to two domains (processing files and managing the file resource). That’s not something functions should do, ideally.

So, what did we do? Well, the immediate fix was to wrap the call in a try-with-resources block, just like the example above. That solved our problem.

Well, it solved one of our problems – the problem of the leaking file descriptors in that one particular location. That, however, was a very superficial problem: there was an underlying problem, which is that we’d written code which leaked file descriptors in the first place, and only found it because we were pounding the filesystem with the frequency and subtlety of a pneumatic drill. How can we address that?

One option is communication and education: we could let all our colleagues know that this method did something we didn’t expect, and they should be aware of it. This would be a Good Thing To Do, but it’s imperfect: some people will ignore the broadcast, some will forget it, some people might even have the gall to start after we’ve sent the message. So that’s not what we did.

What about the documentation, you may ask?

“But Mr Dent, the plans have been available in the local planning office for the last nine months.”
“Oh yes, well as soon as I heard I went straight round to see them, yesterday afternoon. You hadn’t exactly gone out of your way to call attention to them, had you? I mean, like actually telling anybody or anything.”
“But the plans were on display …”
“On display? I eventually had to go down to the cellar to find them.”
“That’s the display department.”
“With a flashlight.”
“Ah, well the lights had probably gone.”
“So had the stairs.”
“But look, you found the notice didn’t you?”
“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard’.”

If we had read the documentation, we would indeed have found a warning about closing the stream once we were done with it. It wasn’t on the documentation for the method we were using, but on an overload of that method referred to by the method we were using (and thus not easily discovered within our IDE). Not quite “Beware of the Leopard” territory, but not advertised as well as it could be.

So maybe we could use this as an opportunity to remind our colleagues of the importance of reading the documentation. This advice would, if followed, have a strictly greater benefit than just warning them of the one case, as it would cover both this mistake that could be made, and countless others as well.

We didn’t do that, either. That’s because this advice actually would be a Bad Thing: it’s bad because it’s patronising, and there’s no real upside because nobody reads documentation unless there’s something they want to find out.

Documentation is, quite simply, a bad place for warnings.

The real problem we had was we didn’t know we had to enclose calls to Files.lines() in try-with-resources blocks. We solved that problem by making sure it can’t happen again.

We’re very proud of our continuous integration pipeline at LMAX. It has various steps: the first step is the commit build, which ensures everything compiles, passes checkstyle, and the unit tests pass. This is then cascaded into a cornucopia of downstream builds – integration and acceptance tests, compatibility and performance tests, and more esoteric builds as appropriate.

One of these runs various analysis tools, including Findbugs. What’s important about how we run Findbugs is that we don’t just use the out-of-the-box detectors: we write our own, and (just like all our other tests) they live in the same project as our production code. Every time we check in, we rebuild our Findbugs detectors and then run them on our code.

This means it is incredibly easy to add new detectors whenever we want. It’s just writing another test.

That’s how we dealt with this problem – we ensured that whenever anyone called Files.lines() outside a try-with-resources block, the build would tell them, and it would remain red until the problem was fixed.

There are a couple of key reasons why a reactive solution, like a failing test, is better than a proactive solution, like educating people so they don’t write buggy code in the first place.

Firstly, there is a limit to how much cognitive load people are capable of handling. If you’re thinking about closing the stream, that has a cost – that’s brain cycles that could be used for something better.

Maybe it means some other, more domain-relevant thought process gets bumped back, and you end up with a business logic bug. Maybe it means you’re thinking more about details than the big picture, and your class design suffers. Maybe it just means you end up taking longer to write the method. That’s thinking that could be handled for you by a computer. Why wouldn’t you want to offload it?

Secondly, and more importantly, proactive solutions aren’t reliable – because people aren’t reliable. We forget, we overlook, we underthink – no matter how good we are, and those of us who believe we don’t make those mistakes are not the good ones. That’s why we write tests: offload that which can be dumbly but reliably detected to something which is good at being dumb but reliable.

So, we’re going to write a test to catch this. What do we want this detector to do?

Ideally, we’d want it to ensure the resources opened by Files.lines() are closed. If we wrote such a detector, that would be nicely general and we could even open-source it – add it to findbugs-contrib, perhaps. Again, that’s not what we did.

For those of you who haven’t had the pleasure of writing Findbugs plugins, it’s – how should I put this? – not the most approachable API I’ve ever used. It takes a bit of experimentation to find out viable ways of doing even the simplest things. Having written a dozen small detectors, my first instinct was to try and learn by example – there are already unclosed-resource detectors in Findbugs, so maybe that would be extendable.

Here’s what that detector looks like.

A couple of things leap out immediately here.

Firstly, it’s not designed for extension. Ideally, it would separate the logic around finding open resources from the possible causes, and adding new cases would be as simple as constructing it with different parameters. Instead, we have a bunch of untouchable internal static final arrays, meaning a) the code is cluttered with both logic and configuration, and b) I can’t extend it for this new case. I’d need to rewrite it all (well, copy-paste it) to have something identical operating on new cases.

Obviously, this wasn’t designed for reuse.

Secondly, it’s complex. It keeps track of all sorts of state, following through variable lifetimes and return values and whatnot. It doesn’t do it in a particularly readable way, and to be honest I didn’t spend a lot of time trying to understand how it works, because I quickly came to the conclusion that this was the wrong solution for our problem.

This is not to say it is bad code. It’s important to point out here that the problem the Findbugs team was trying to solve is actually quite different to the problem I was trying to solve when writing detectors. For them, the cost of complexity is lower, because they’re intimate with the API and the programming techniques and patterns to create rigorous detectors. More importantly, for them, the cost of false positives is very high.

Static analysis is an incredibly useful tool, but it has its costs. For a free tool like Findbugs, the costs are those of integration into the build system, the time to run the job, and the cost of addressing defects when identified. The latter is probably the principle cost (while, of course, simultaneously being the benefit) – but this is horribly front loaded.

Running a new static analysis tool across a codebase will highlight a number of issues which need addressing, and if you don’t address them all, then you have a job which isn’t normally green, which is as good as pointless. Furthermore, this cost predates the point where you start seeing the real value.

At this point, if you have false positives, then you’re being told by some interloping robot gatekeeper judge that your code is bad and wrong when it’s obvious to you it isn’t, and furthermore the burden is on you to change the code to meet its standards. This will change people’s opinions on the cost-benefit balance of making the tool work, and in the worst case will train them to suppress warnings as a first resort instead of addressing them. A small number of false positives will be forgiven: a large number will not.

It is important, therefore, that such tools have as few false positives as possible out of the box. The complexity of the FindOpenStream detector is probably the minimum that could be gotten away with as a general-usage tool.

The equation is utterly changed when considering internal usage. The cost of complexity goes way up: few people on the team have even a passing experience with detectors, and it’s important that anyone on the team can handle code that anyone else wrote, so there’s a real emphasis on making detectors absolutely as simple as possible. Furthermore, the cost of false positives goes way down.

False positives in static analysis are usually cases where there is a potential bug which is easily identified and then there are multiple ways to address. For this example, our cause is calling Files.lines() and our remedies include surrounding with try-with-resources, manually closing, returning the stream from the function to a calling function which manually closes, and so on. False positives occur when the detector either over-identifies causes or under-identifies remedies. The thing is, within a small-ish team, under-identifying remedies can also be described as imposing a coding convention.

Let’s go back to our original, problematic code:

public List method(Path path)
    return Files.lines(path)

We only kept the try-with-resources local for a short while, quickly replacing it with this:

public List method(Path path)
    return MyFiles.lines(path, 
                         lines -> lines.filter(this::interestingLines)

This is very nearly as elegant and simple-to-use as the original, and it calls the following utility method, extracted out for re-use:

public static T lines(Path path, Function<Stream, T> f)
    try (Stream lines = Files.lines(path))
         return f(lines);

This guarantees that the stream is closed, moves the responsibility for closing the stream to a separate location, and allows us to retain the conciseness that attracted us to Files.lines() in the first place.

It also makes writing a detector incredibly straightforward: instead of checking to see if the stream provided by Files.lines() is closed, we simply ban the use of Files.lines(). Regardless of context.

Instead of Files.lines(), we demand that MyFiles.lines() is used in its place (suppressing, of course, the detector for the implementation of MyFiles.lines()). If anyone does call Files.lines(), a build breaks, and the error message from the detector points us to the approved implementation.

This is an approach which is clearly not acceptable for Findbugs to impose, so we’re not going to open-source it. In terms of getting such a detector into Findbugs, a better approach would be simply to request this is added to the existing FindOpenStream detector for the next revision – so that’s what I did.

It is, however, fine in a small team. There are few enough use sites that we can move to this new convention when we introduce it, and having a consistent approach to a given problem is generally a net good, as long as that approach meets our needs. If it doesn’t meet our needs, well, there’s another advantage we have over the Findbugs team: it’s way easier to change our code. If we decide as a team that we don’t want to use this convention, it’s trivial for us to change (or even wholesale dispose of) this check.

The detector looks like this:

public class FilesDotLinesDetector extends IllegalMethodCallDetector
    public FilesDotLinesDetector(BugReporter reporter)
        super(reporter, new IllegalMethodCall(Files.class, 

I’ve elided little here: IllegalMethodCallDetector and IllegalMethodInvocation are simple utility classes within our Findbugs project and they’re both pretty trivial too. The point here is that if anyone looks at this detector, it’s incredibly clear what’s going on, even if they’ve never looked at the Findbugs API before. If anyone wants to ban the use of any other methods, that’s trivial for them.

So, to summarise:

  • Files.lines() leaks file descriptors if it isn’t closed properly. This is easy to fix.
  • Fixing only the immediate problem – the offending callsite – is insufficient. It is a bug which is likely to reoccur: something should be done about that.
  • The best way to prevent it reoccuring is reactively, not proactively: detect when it happens and then demand a fix.
  • As proactive solutions go, documentation sucks particularly badly.
  • The reactive solution is easiest to implement when it errs on the side of broadness.
  • The consequences of overly broad tests include valuable benefits, and the costs are easily addressed, as long as the rules are not considered immutable.

PS: It may be that a better approach to a utility method would instead to be to define our own Stream implementation, which simply wraps an existing Stream, except it wraps terminal methods in a try-with-resources block. That seems obvious now but it only just occurred to me: it’s beside the key point of this post, though.

PPS: My spell-checker keeps trying to turn Findbugs into Windbags. I suspect some of you may find that apropos.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s