Sunday, September 10, 2017

A StringReader in Rust

Update

I was recently working on a rust crate (read library) for parsing input files for Conway's game of life. The idea was to have a Parser trait like this:

use std::io::Read;
 
pub trait Parser {
    fn parse<T: Read>(&mut self, input: T) -> Result<GameDescriptor>;
}

The idea was to be as rustacenous (code like a rustacean?) as possible and use the standard traits where applicable. In this case this means accepting a Read as input, similarly to accepting an InputStream in a Java class.

I was writing unit tests when I came upon a slight problem: I didn't want to create test data files for simple tests, so ideally in my test I would just pass a string to the parse method. But Rust's String doesn't implement Read (why not??). Recalling the rules around trait implementations, I knew I couldn't implement Read for String myself.

The Rust book:

either the trait or the type you’re implementing it for must be defined by you. Or more precisely, one of them must be defined in the same crate as the impl you're writing.

I didn't find any solution for this on the web, and no crates that help with this issue either. So I decided to publish one myself: stringreader.

StringReader

Here's the basic code:

pub struct StringReader<'a> {
    iter: Iter<'au8>,
}
 
impl<'a> StringReader<'a> {
    pub fn new(data: &'a str) -> Self {
        Self {
            iter: data.as_bytes().iter(),
        }
    }
}

Simple enough, the struct contains an Iter<'a, u8> (more on that later), and there's a constructor that accepts a &'a str, so the lifetime of the StringReader must not exceed that of the input. The iterator inside StringReader is initialized by converting the input to a slice of bytes (&[u8]) and then getting an iterator for that slice.

Implementing the Read trait

The std::io::Read trait requires only a single method to be implemented, but it provides some additional helpers via default implementations. In order to make StringReader implement Read one needs to provide an implementation for read:

fn read(&mut self, buf: &mut [u8]) -> Result<usize>;

The read method accepts a mutable slice of u8 (read unsigned byte) and returns a std::io::Result<usize>, where the positive result should contain the number of bytes read. At first I thought "why isn't there a parameter telling me how many bytes to read?", but then I remembered that slices in Rust know their length, so there can't be a buffer overflow.

My implementation for the trait looks like this:

impl<'a> Read for StringReader<'a> {
    fn read(&mut self, buf: &mut [u8]) -> Result<usize> {
        for i in 0..buf.len() {
            if let Some(x) = self.iter.next() {
                buf[i] = *x;
            } else {
                return Ok(i);
            }
        }
        Ok(buf.len())
    }
}

I iterate over a range from 0 to the length (exclusive) of the passed-in byte slice. Then, for each index, I try to get the next byte from the iterator and, if it's a Some, write it to the slice. Otherwise, if the iterator returns a None, we know we've reached the end of the string, so we return a positive result with the number of bytes read. If the end of the for loop is reached we just return the length of the slice.

Using StringReader in practice

Using the newly created crate, i could write relatively simple tests like this one:

use stringreader::StringReader;
// init parser
let input = StringReader::new("#N");
parser.parse(input);
// assertions

Conclusions

I was really surprised that std::io::Read isn't implemented for Rust's string type(s), but in the end this allowed me to contribute a (imho) useful crate to the community. That Rust's slices provide an iterator was very helpful in accomplishing the task. All in all I'm satisfied with the crate and I hope it'll help others as well.

Thanks for reading!

Update

Someone hinted that, while std::io::Read is not implemented for Strings, it is implemented for &'a [u8], which is incidentally what String's as_bytes method returns. Therefore, for example in a test, you can simple do "foobar".as_bytes() and you get a byte slice that implements std::io::Read.

Therefore the stringreader crate is obsolete.

Thursday, March 23, 2017

Bringing Rust's Result type to Java

The Rust programming language was designed without exceptions to handle errors. Instead, the concept of errors is addressed with the generic Result<T, E> enum type. In this post I will compare Rust's and Java's error handling mechanisms and discuss if and how Rust's way of doing it can be applied to Java.

Java exceptions

Java's error handling mechanism is built around the Throwable interface. Every type (i.e. interface, class) that is a subtype of Throwable can be thrown and caught in a try-catch block. The classes Error and Exception are subtypes of Throwable and whether a type inherits from Error or Exception is the first main distinction: Errors are really exceptional (no pun intended) situations that shouldn't be caught or handled by the overwhelming majority of programs. Exceptions on the other hand are the bread and butter for Java developers.

  +-----------+
  | Throwable |-------+
  +-----------+       |
        |             |
        V             V
    +-------+   +-----------+
    | Error |   | Exception |-----------+
    +-------+   +-----------+           |
                      |                 |
                      V                 V
          +------------------+   +-------------+
          | RuntimeException |   | IOException |
          +------------------+   +-------------+

When Java was designed, the decision was made to add the concept of checked exceptions. A checked exception is any class that inherits from Exception and doesn't also have RuntimeException as an ancestor, which in turn is a direct subtype of Exception. Whenever your code calls a method that may throw a checked exception, you have to handle it, either by adding a compatible throws declaration to your method or by catching it in a try-catch block.

Rust Results

Rust's error handling mechanism is built around the generic Result<T, E> enum type. The enum is defined as follows:

pub enum Result<T, E> {
    Ok(T),
    Err(E),
}

A function that returns a string but can fail (recoverably) will define its return type as Result<String, E> where E is the type that is returned in the error case. Rust's type system is designed in such a way that the simplest definition of a Result is actually Result<(), ()>, so both the happy and the error case contain an empty tuple as the payload. Even this case conveys a minimal amount of information: whether the operation succeeded or failed, and it does so in a more explicit way than a boolean return type could achieve.

Similar to Java's Error and Exception types, Rust has a second way of 'handling' errors: the thread panic. When a thread panics it means something has gone horribly wrong. Thread panics are not meant to be caught. Even though panics are are comparatively extreme measure, they're not without use.

When you design your API you have to think about what kinds of errors deserve to be treated as recoverable and what kinds are for example the result of false usage. An example from the Java world would be the well-known NullPointerException: You usually wouldn't define a catch block that handles an NPE, because it's usually the developer's fault. A NumberFormatException on the other hand could very well be the result of a user entering an invalid value.

An example for this kind of consideration in API design is Rust's Vec type: the get method returns an Option<T>, so trying to get an index that doesn't exist in the data structure will never panic, but it could tell you that there's nothing under that index by returning None. However, Vec also allows you to access its elements by index, using the square bracket syntax (foo[5]), but it's important to know that this will make the thread panic if the index is out of bounds. It's important to get this design right because it greatly influences the usability of your API - panic too often and the users of your API need to do a lot of verifications; overuse the Result type and developers need to handle them all over the place - in both cases the usability of your API suffers.

A Result type for Java

I've created the result-flow library, which brings a Result interface and the implementing classes Ok and Err. It's located in the Nexus repository here.

Consider the following example:

public class Numbers {
    public static void main(final String[] args) {
        final Result<Integer, String> result = readLine()
            .andThen(Numbers::parseInt)
            .map(Numbers::doubleUp);
        System.out.println(result);
    }
 
    private static Integer doubleUp(final Integer value) {
        return value * 2;
    }
 
    private static Result<Integer, String> parseInt(final String input) {
        try {
            return Result.ok(Integer.parseInt(input));
        } catch (final NumberFormatException e) {
            return Result.err(e.getMessage());
        }
    }
 
    private static Result<String, String> readLine() {
        try {
            final InputStreamReader in = new InputStreamReader(System.in);
            final BufferedReader buf = new BufferedReader(in);
            return Result.ok(buf.readLine());
        } catch (final IOException e) {
            return Result.err(e.getMessage());
        }
    }
}

The main method reads a line from stdin, then parses the read line to an Integer and finally doubles the value. As you can see, this code does not handle an error at all, it simply prints the result at the end. If the user enters a valid integer the output will be something like Ok(14). Should the user input something like 'a', the output will be Err(For input string: "a"), so the Err wraps the message of the NumberFormatException.

Notice the difference between andThen and map: The former is used when the method to be called returns a Result, whereas the latter is used when that method does not fail with a Result itself.

Notice also that an IOException that occurs when we try to read from the InputStream will also be wrapped in an Err. This obviously doesn't make a lot of sense in production code. Depending on the context an IOException would rather be treated as an exceptional or unrecoverable error.

Hence, my advice would be to keep any truly exceptional and/or unrecoverable errors like the aforementioned panics in Rust and use the conventional try-catch block on some level in the call stack. For errors of the application domain however, I think the pattern could be applicable on the JVM.

Error types

The Result type is generic, so any type of error (or ok value of course) is possible. In the Rust world a common pattern is to use enums as error types, but depending on the necessary information structs are not unheard of in this role either. When you use a library (or crate for Rustaceans) that returns Results it is typical to either wrap or translate the erroneous values into a type of the domain of the application, typically an enum.

pub enum ApplicationError {
  AppError,         // some meaningful error in the application
  DbError(PgError), // wraps an error of the database connector
}

Rust enums are more powerful than Java's in the sense that they can wrap values, whereas Javas enum instances are static. This is easily overcome in Java by using actual classes or instances respectively, it cannot help with the language-specific problems.

Match expressions

Rust's match statement can be compared to Java's switch, but it is much more powerful. For instance the Java compiler will not complain about a switch statment over an enum that is not exhaustive, whereas Rust will fail the compilation if not all enum values have been addressed. Furthermore, Rust's match statement can actually look into the provided enum and bind the contained value to variables. This is one shortcoming that cannot easily be helped in Java. Less important in this context but nonetheless worthy of mentioning: Rust's match is an expression and can return a value, whereas Java's switch is a statement.

let foo: Result<Stringi8> = Ok("Hi!");
match foo {
  Ok(x) => println!("Got Ok: {}", x),
  Err(f) => println!("Got error: {}", f),
}

Macros

Rust's support for macros adds greatly to the usefulness of the Result enum, because it enables a function to not explicitly handle an error but to stop the execution and return the error. This is closely related to Java's throws declaration.

fn foo() -> Result<String, ()> {
  let b = bar()?;
  let c = try!(bar());
 
  // do something
}
 
fn bar() -> Result<String, ()> {
  Err(())
}

In the example above, function foo calls function bar. Both functions have the same return type. Rust's compiler complains about unhandled results, but foo doesn't want to handle any errors. Instead it uses the try!(<expr>) macro (which can also be written as <expr>?) that generates the necessary code to return an eventual error preemptively from the function. The Java equivalent can be seen in the next code sample. This is a feature that cannot be mimicked in Java.

public String foo() throws MyError {
  final String b = bar();
  final String c = bar();
 
  // do something 
}
 
public String bar() throws MyError {
  throw new MyError();
}

Conclusion

The biggest disadvantage that I see with Rust's Result type in Java is that it breaks with the idiomatic way to code in Java and that the developer has to think very carefully about which errors they encode in a Result and which of them as RuntimeExceptions (or panics in Rust). A great difficulty are third-party libraries as well as some parts of the standard library that rely on checked exceptions. Those will most probably have to be wrapped with try-catch and converted to either RuntimeExceptions or Results.

The great advantage of the approach is the way it enables a more functional type of programming, like version 8 of Java did with the Optional type. I have yet to try the library in any type of project apart from small experiments. Should you try it out I'll be glad to have your feedback and thoughts about it.

Monday, March 6, 2017

My shot at RESTful Microservices in Rust - Part 3

Part 3 - Linking REST endpoint and db layer

Welcome to part 3 of my Rust microservices series! If you haven't read parts 1 or 2, here are the respective links: part 1 part 2. In this installment I'm going to connect the REST endpoint with the database layer and take care of serialization and deserialization of the Rust structs.

JSON serialization

There are several crates that give you automatic serialization and deserialization of structs to JSON strings. I'm going to use Serde in this PoC. Serde is divided into a core crate and one additional crate per source/target format. So I'm going to use the crates serde, serde_derive and serde_json. The crate serde_derive contains the Serialize and Deserialize macros that implement the trais with same names. This enables us to serialize a struct by calling serde_json::to_string.

src/models/game.rs:

#[derive(Debug, Serialize)]
pub struct DbGame { /* omitted. */ }
 
#[derive(Debug, Serialize)]
pub struct Dimensions { /* omitted. */ }

src/main.rs:

#[macro_use] extern crate serde_derive;
extern crate serde_json;
 
fn main() {
    for game in dao::get_games() {
        println!("{:?}", serde_json::to_string(&game).unwrap());
    }
}

Unsurprisingly, deserialization works the same way.

Connecting the REST endpoint to the database

I'm gonna create a simple endpoint listening on GET /games that will return a list of all games. src/main.rs:

fn main() {
    let mut server = Nickel::new();
    server.get("/games", middleware! {|_req, mut resp|
        resp.set(MediaType::Json);
        let games = dao::get_games();
        serde_json::to_string(&games).unwrap()
    });
    server.listen("0.0.0.0:8080")
        .expect("Error starting server");
}

When I cURL this endpoint I get

< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Sun, 05 Mar 2017 18:26:36 GMT
< Server: Nickel
< Transfer-Encoding: chunked
<
* Connection #0 to host localhost left intact
[{"id":1,"dimensions":{"x":3,"y":3}},{"id":2,"dimensions":{"x":4,"y":5}}]

Deserializing JSON

So now that we've got a working endpoint that lists all the games, let's add one that actually creates a game. I'm gonna keep things simple here and let the caller choose the id of the game and not care about key uniqueness issues for this PoC. The first step is to add the Deserialize macro to the entity structs. After that it's mostly about the dao and the controller code.

src/main.rs:

// ...
server.post("/games"middleware! {|req, mut resp|
    match get_game_from_request(req) {
        Ok(game) => {
            resp.set(StatusCode::Created);
            dao::create_game(game);
            "Ok!".to_string()
        },
        Err(e) => {
            resp.set(StatusCode::BadRequest);
            e
        }
    }
});
// ...
fn get_game_from_request(
    req: &mut nickel::Request,
) -> Result<DbGame, String> {
    let mut body = String::new();
    req.origin.read_to_string(&mut body).unwrap();
    serde_json::from_str::<DbGame>(&body)
        .map_err(|e| e.description().to_string() )
}

Nickel provides built-in JSON deserialization, but this feature relies on the rustc_serialize crate, which I'm not using. Serde is a newer and more modular implementation for serialization and deserialization. The get_game_from_request function extracts the body from the request and then tries to deserialize it. The database access code is straight-forward:

game_dao.rs:

pub fn create_game(game: DbGame) {
    let conn = connect();
    conn.execute(r#"
        INSERT INTO games (id, dimension_x, dimension_y)
        VALUES ($1, $2, $3)"#,
        &[&game.id, &game.dimensions.x, &game.dimensions.y]
    ).expect("Error inserting into database");
}

As promised at the beginning, I don't care about primary key uniqueness in this PoC, so if you try to POST a game with an id that's already there, the thread is going to panic.

Conclusions

We've seen that it is possible to create microservices in Rust with little effort, even though compared to older languages there's more boilerplate code that you have to write yourself. Especially Nickel seems to have a lot of room for improvement. I don't like that you seem to have to return a String from every endpoint definition in the middleware! macro, but then I'm not very good at reading macro definitions in Rust yet.

One could think that interacting with postgres directly and not using an OR-Mapper is a bad idea, but I think that especially in microservices, the number of entities is usually small enough for that not to matter too much.

This concludes the third and last part of this proof of concept. You can find the source code here. Thanks for reading and please feel free to comment.