Errors

Rust's try operator (?) and support for sum types (e.g., Result which can be either Ok or Err) make for powerful and easy-to-reason-about error handling. Unfortunately as of this writing in late 2020 there is no consensus solution for representing and reporting errors in crates. The existing std::result::Result and std::error::Error types provide the bare bones for representing successful and failed operations, but they're not adequate on their own.

In first year or so of Elastio, we used the snafu crate, which is an opinionated implementation of error handling in Rust that seemed to work well. Many of our crates still use Snafu today, because there's no reason to migrate. However recently momentum has been building behind two crates which together are fast becoming the de facto Right Way(tm) for doing things: anyhow and thiserror.

anyhow provides a universal Error type which can wrap any std::error::Error implementation and provide some useful operations. It's the equivalent of throws Exception in Java. We use anyhow when building CLIs that need to be able to handle various kinds of errors from different crates and have no need to wrap those errors in a specific error type. In some cases we also use anyhow when writing tests, when the alternative would be Box<dyn Error> which is nasty.

thiserror is used at Elastio when a library crate needs to expose an error enum as part of it's API. Where historically we've used snafu, newer crates (and any crates created in the future) use thiserror to build a CrateNameError variant.

This chapter describes the best practices we've evolved with these crates, and should be followed unless there's a good reason to do something different.

Official Docs

Most of the details of how to use anyhow and thiserror are covered in their respective docs. None of that will be repeated here, so before reading the rest of this chapter make sure you've reviewed the official docs for both crates and have a good understanding of how they work in general (and, in particular, how anyhow is different than thiserror and under what circumstances one should use one versus the other).

Legacy Snafu

As noted above, many of our biggest and most important crates were written before thiserror was clearly the way forward for library error reporting. Those crates use snafu instead. There's nothing wrong with snafu, and it's similar in design to thiserror in many ways. Those crates that use snafu will continue to do so, and if you find yourself needing to add or modify the error variants in those crates, you too must use snafu. Porting existing error handling to thiserror without a compelling reason is not a good use of engineering time.

Having said that, starting a new crate in 2020 or beyond and using snafu to build the error type is also not a good use of engineering time.

Error variant naming

Each variant of the error enum is obviously an error, because it's a variant of the error enum. Thus, the word Error should not appear in the name of the error enum as it's redundant. So IoError is a bad name for an error; Io is good.

Error messages

`thiserror`

When defining an error type with thiserror, it's easy to define what the error message should be for an error:

#![allow(unused)]
fn main() {
use thiserror::Error;

#[derive(Error, Debug)]
pub enum Error {
    #[error("IO error")]
    Io {
        source: std::io::Error
    }
}
}

You might be tempted to try to make another error type's message part of your error, like this:

#![allow(unused)]
fn main() {
use thiserror::Error;

#[derive(Error, Debug)]
pub enum Error {
    // WRONG DON'T DO THIS
    #[error("IO error: {source}")]
    Io {
        source: std::io::Error
    }
}
}

You might even find this pattern in our code still. However the latest best practice is to report the error type which caused a particular error by returning it in the std::error::Error::source() function, which for thiserror means a field named source or marked with the #[source] or #[from] attributes. Why?

Because it's often not at all useful to include the error message of an inner error. Maybe that error is itself just a higher-level error type, and the real cause of the error is nested three or more errors deep. Or maybe you need to know the details of all of the errors in the chain from root cause on up to your error type in order to understand what really happened.

Therefore, instead of forcing the nested error's message into your error message, you should rely on crates like anyhow (or color-eyre) to pretty-print error types at the point where they are communicated to users (printed to the console in the case of a CLI, or written to the log in the case of a server).

`anyhow`

When reporting errors with anyhow, the principle is the same but the mechanics are slightly different. Since anyhow::Error is a universal container for any error type, you do not use anyhow to define strongly typed errors. Instead you wrap arbitrary error types in anyhow. But sometimes you have a situation where you have an error e, and you want to report it as an anyhow::Error but with some additional context to help clarify the error. This is the wrong way:

#![allow(unused)]

fn main() {
use anyhow::anyhow;

// DO NOT DO THIS
let e = std::io::Error::last_os_error();
// Assume `e` contains an error you want to report
anyhow!(format!("Got an error while trying to frobulate the gonkolator: {}", e));
}

This has the same problem as the thiserror example above. You're losing all of the information in e other than it's message. Maybe it had its own source with valuable context, or a backtrace that would have clarified where this happened. Instead you should use anyhow to wrap the error in some context:

#![allow(unused)]

fn main() {
use anyhow::anyhow;

// This is the right way
let e = std::io::Error::last_os_error();
// Assume `e` contains an error you want to report
anyhow::Error::new(e).context("Got an error while trying to frobulate the gonkolator");
}

Using anyhow `context` or `with_context`

The above example uses context on an anyhow::Error. anyhow also has a Context trait which adds context and with_context to arbitrary Result types, to make it easier to wrap possible errors in context information.

Be advised that in this case you should avoid allocating strings when calling context. For example:

use anyhow::Context;

fn get_username() -> String {
    // ...
  "foo".to_owned()
}

fn get_host() -> String {
    // ...
  "foo".to_owned()
}

fn main() -> anyhow::Result<()> {

// WRONG
std::fs::File::create("/tmp/f00")
 .context(format!("Error creating file foo for user {} on host {}", get_username(), get_host()))?;

// RIGHT
std::fs::File::create("/tmp/f00")
 .with_context(|| format!("Error creating file foo for user {} on host {}", get_username(), get_host()))?;

Ok(())
}

By passing a closure to with_context, you defer the evaluation of format! unless File::create actually fails. On success you skip all of this computation and the associated heap allocation, and calls to get_username() and get_host().

`error` module

Each library crate should have an error module. This should define a thiserror-based error enum named CrateNameError, where CrateName is the pascal case representation of the crate's name.

Many Rust crates and the Rust std lib use an error representation called Error, but this leads to problems with code clarity when dealing with multiple different crates' error types.

The error module should also define a type alias called Result, which aliases to std::result::Result with the default error type set to the crate's error type, e.g.:

#![allow(unused)]
fn main() {
// In `error.rs`
pub enum CrateNameError { /* ... */ }

pub type Result<T, E = CrateNameError> = std::result::Result<T, E>;
}

If necessary (and only if necessary!) the root of each library crate should expose the error module publically. As of now the only reason this would be necessary is to expose the Snafu-generated context selectors to other crates, which is only needed if macros are generating code that needs to use those error types. That's an edge case; in general error should be private.

In all cases, the error enum and Result type alias should be re-exported from the root of the crate, e.g.:

// In lib.rs

// The `error` module should not be public except for the edge case described above
mod error;

// but the Error type and the Result type alias should be part of the public API in the root module
pub use error::{Result, CrateNameError};

// And all funcs should use this public re-export

// Note this is using crate::Result, not crate::error::Result
pub fn some_public_func() -> Result<()> {
  todo!()
}

Other modules in the crate should use crate::CrateNameError, NOT use crate::error::CrateNameError. This is for consistency between crate code and external callers, and also because it's less typing.

Note that when using Snafu, Snafu's context selectors should NOT be re-exported this way, and when referenced within other modules in the crate, those modules should use crate::error, and refer to the context selectors with error::SomeErrorKind.

Using `ensure`

Anyhow provides the ensure! macro to test a condition and fail with an error if it is false. Prefer this to explicit if or match expressions unless they add some clarity somehow.

Elastio Engineering Handbook