Learn Rust by building a Brainfuck interpreter, part 2: the execution engine

Introduction

In the last post, we created the first building blocks for our Brainfuck interpreter in Rust:

  • A data structure for the tape, which supports four operations:

    • left and right move the data pointer,
    • get returns the u8 value that is stored in the current cell,
    • set assigns a new u8 value to the current cell.
  • A data type for the instructions. It is based on Rust's enums, which are much more powerful than enums in C or C++ because they allow to build algebraic data types:

    #[derive(Debug)]
    enum Instruction {
        Inc,                    // +
        Dec,                    // -
        Left,                   // <
        Right,                  // >
        Read,                   // ,
        Write,                  // .
        Loop(Vec<Instruction>)  // [...]
    }
    

    This definition allows to represent a Brainfuck program as an abstract syntax tree, which makes program execution simple. However, it requires a bit more work when parsing the program, as we will see in a later post in this series.

In this post, we will implement the next part of the interpreter: an execution engine which applies instructions to the state of the tape, and performs input and output.

Importing Rust files into a Jupyter notebook

I've taken the code from the Jupyter notebook that the last post was based on and copied it to proper Rust files in the directory for the new blog post: tape.rs, instructions.rs. The Jupyter kernel for Rust allows to import these into the notebook that is the source of the blog post which you are currently reading:1

In [2]:
:dep rust_bf = { package = "rust-bf", path = "." }

For convenience, we will import all variants of the Instruction enum and the Tape data structure:

In [3]:
use rust_bf::{instructions::{Instruction, Instruction::*}, tape::Tape};

We can now create a tape and manipulate it:

In [4]:
let mut t = Tape::new();
t.right();
t.set(42);
t.left();
t.left();
t.set(t.get() + 5);
t
Out[4]:
Tape { data: [5, 0, 42], pos: 0 }

Even though we cannot write Brainfuck programs in the usual way, we can construct the abstract syntax tree using the variants of the Instruction type directly:

In [5]:
// 1. Read a byte value from standard input.
// 2. Write all numbers from this value down to zero to standard output.
let program_countdown = vec![
    Read,
    Loop(vec![Write, Dec]),
    Write
];

But how can we execute the program?

Step 1: programs that do not need I/O

We'll start simple and look at programs which do not use the Read and Write instructions. Since such a program does not create any output, we can only observe the modifications to the tape.

This is a program that will aggregate the values of neighbouring non-zero cells to one of these cells, and then terminate:2

In [6]:
let program_add = [
    // Go right until the current cell is zero.
    Loop(vec![Right]),  

    // Go to the second non-zero cell from the right.
    Left,
    Left,

    // As long as the current cell is non-zero:
    // * add the value of the right neighbour cell
    // * move the data pointer one cell to the left
    Loop(vec![
        Right,
        Loop(vec![Dec, Left, Inc, Right]),
        Left,
        Left
    ])
];

So if the initial state of the tape is such that it contains the values 3, 4, and 8 surrounded by zeroes,

... 0 3 4 8 0 ...

then we would expect that after runing the program, there is only one non-zero cell remaining, which contains the sum of the values:

... 0 15 0 0 0 ...

How can we achieve this?

We will first implement the skeleton of the execution function, which takes two parameters for the time being:

  • tape is a mutable reference to the tape (&mut Tape). It needs to be mutable because the tape contents will be modified during the execution.
  • instructions is a slice of instructions (&[Instruction]). Essentially, a slice combines a pointer to the start of a range of items, and the length of the range. Using a slice, rather than a Vec, is more flexible because this allows to call the function not only with a Vec, but also with fixed-size arrays and parts of a Vec or an array. Note that the Rust compiler will convert a reference to a Vec automatically to a slice when calling the function.3

The function will then loop over instructions.

Let's have a look at what we have so far and just print each instruction before we will see how to evaluate them:

In [7]:
fn execute_v1(tape: &mut Tape, instructions: &[Instruction]) {
    for instruction in instructions {
        println!("execute instruction: {:?}", instruction);
    }
}

let mut t = Tape::new();
execute_v1(&mut t, &program_add);
execute instruction: Loop([Right])
execute instruction: Left
execute instruction: Left
execute instruction: Loop([Right, Loop([Dec, Left, Inc, Right]), Left, Left])

Now we want to do something useful with our instructions.

Enum values, such as our instructions of type Instruction, are usually evaluated using pattern matching in Rust. A match expression looks like this:

In [8]:
for instruction in [Left, Right, Inc, Dec, Loop(vec![Dec, Right]), Read, Write] {
    let result = match instruction {
        Left => "left",
        Right => "right",
        Inc => "+",
        Dec => "-",
        Loop(body) => "loop",
        _ => "?"
    };
    println!("instruction: {} ", result);
};
instruction: left 
instruction: right 
instruction: + 
instruction: - 
instruction: loop 
instruction: ? 
instruction: ? 

It contains match arms, which consist of a pattern, and some code that is evaluated if the pattern matches the value. In this simple example, the code is just a string constant for each case.

Now we can think about what code to execute for each instruction:

  • Left and Right are easy: for these, we just have to call tape.left() and tape.right(), respectively.
  • For Inc, we have to increase the value of the current cell: tape.set(tape.get() + 1)
  • Analogously for Dec: tape.set(tape.get() - 1)

When we encounter a Loop instruction, we have to check if the current cell is zero. If that is not the case, we repeatedly execute all instructions in the loop body until the current cell becomes zero:

while tape.get() != 0 {
    execute(&mut tape, body)
}

If we put everything together, we end up with this function that can execute any Brainfuck program without Read and Write instructions:

In [9]:
fn execute_v2(tape: &mut Tape, instructions: &[Instruction]) {
    for instruction in instructions {
        match instruction {
            Left => tape.left(),
            Right => tape.right(),
            Inc => tape.set(tape.get() + 1),
            Dec => tape.set(tape.get() - 1),
            Loop(body) => {
                while tape.get() != 0 {
                    execute_v2(tape, body)
                }
            }
            _ => panic!("Read and Write are not handled yet!")
        }
    }
}

Note that we must handle all cases in the match expression, or the code will not compile. Here we use the panic! macro, which usually aborts the process with the given error message, if the Read or Write instructions are used.

We can now try to execute our program that sums the numbers on the tape:

In [10]:
let mut t = Tape::new();
t.set(3);
t.right();
t.set(4);
t.right();
t.set(8);
t.left();
t.left();

println!("Initial state: {:?}", t);

execute_v2(&mut t, &program_add);

println!("Final state:   {:?}", t);
Initial state: Tape { data: [3, 4, 8], pos: 0 }
Final state:   Tape { data: [0, 15, 0, 0, 0], pos: 0 }

It works!

Note that every cell which has once been the current cell has got the value zero. This is not significant though because every cell which has not been visited yet has the value zero implicitly.

Step 2: input and output

So far, we have ignored the instructions Read and Write (denoted by ',' and '.' in Brainfuck source code).

We can implement match arms for them in the execution function as follows.

For Read, we could access standard input with the function std::io::stdin() , and read a byte from it with read_exact(...). The following function shows how it works:

In [11]:
use std::io::Read;  // needed to bring read_exact(...) into scope

fn read() -> u8 {
    let mut buf: [u8; 1] = [0];
    std::io::stdin().read_exact(&mut buf).unwrap();
    buf[0]
}

read_exact(...) tries to fill the given u8 slice with bytes from standard input. The reason why we call unwrap() on its return value is the following:

In Rust, functions that can fail usually return a value of the type Result<T, E>, where T is the type that a successful execution of the function would yield, and E is an error type. For read_exact(...), which returns no useful information in the successful case, T is (), the empty type. Result is an enum with two variants:

  • A value of type Ok(T) is returned if the function execution was successful.
  • An Err(E) signals that an error occurred.

Therefore, it is impossible to forget error handling if the return value is used. There are several ways to work with results and errors. Here we choose to call .unwrap() on the result, which unwraps the value from Ok, and panics if the result is actually an Err. Using unwrap is useful in two situations:

  • If we are 100% sure that there cannot be an Err value in the result because we know that some preconditions are fulfilled which guarantee success.
  • If a panic is acceptable, e.g., because we are just experimenting, and not writing production code.

Note that since we do not need the empty result value of read_exact(...) at all in this particular case, we could in principle forget to handle errors. The code would compile just fine without calling .unwrap(), but the compiler would warn about the lack of error handling (at least outside Jupyter notebooks).

In a future post, we will discuss how to handle all errors properly in our Brainfuck interpreter.

Similarly, we can use std::io::stdout() to access standard output for the Write instruction, and write a byte to it with write_all(...):

In [12]:
use std::io::Write;  // needed to bring `write(...)` into scope

fn write(value: u8) {
    std::io::stdout().write(&[value]).unwrap();
}

Similar to what we saw for reading, write(...) will attempt to write bytes from a u8 slice to standard output, and return Ok(n) if n bytes were written successfully. Again, we simply unwrap the result value and postpone proper error handling to a future blog post.

The function which executes Brainfuck code, including input and output, now looks like this:

In [13]:
fn execute_v3(tape: &mut Tape, instructions: &[Instruction]) {
    for instruction in instructions {
        match instruction {
            Left => tape.left(),
            Right => tape.right(),
            Inc => tape.set(tape.get() + 1),
            Dec => tape.set(tape.get() - 1),
            Loop(body) => {
                while tape.get() != 0 {
                    execute_v3(tape, body)
                }
            }
            Read => {
                let mut buf: [u8; 1] = [0];
                std::io::stdin().read_exact(&mut buf).unwrap();
                tape.set(buf[0])
            }
            Write => {
                std::io::stdout().write(&[tape.get()]).unwrap();
            }
        }
    }
}

To test it, we will try the following program which is the equivalent of

print!("#\n");

in Rust:

In [14]:
let print_hash = vec![
    // load ASCII value for '#' (35)
    Inc, Inc, Inc, Inc, Inc, Inc, Inc, Inc, Inc, Inc,
    Inc, Inc, Inc, Inc, Inc, Inc, Inc, Inc, Inc, Inc,
    Inc, Inc, Inc, Inc, Inc, Inc, Inc, Inc, Inc, Inc,
    Inc, Inc, Inc, Inc, Inc,

    // write to stdout
    Write,

    // clear the current cell
    Loop(vec![Dec]),

    // load ASCII value for '\n' (10)
    Inc, Inc, Inc, Inc, Inc, Inc, Inc, Inc, Inc, Inc,

    // write to stdout
    Write
];
In [15]:
let mut t = Tape::new();
execute_v3(&mut t, &print_hash);
#

Input also works, but cannot be demonstrated easily in a Jupyter notebook.

Making input and output more flexible and testable with the traits Read and Write

We will now make the function execute(...) more generic, such that it does not use standard input and standard output directly. Instead, we will make the function accept parameters which define where data should be read from and written to. We can then just pass std::io::stdin() and std::io::stdout() to this function to get the same behavior that we had so far.

This approach has some advantages

  • We can simulate input even in a Jupyter notebook.
  • More importantly: we can write unit tests for our execution engine. These tests can provide input data and verify output data.

How does this work?

Here is a generic function that writes a byte to a given destination:

In [16]:
fn write<W: Write>(dest: &mut W, value: u8) {
    let mut buf: [u8; 1] = [value];
    dest.write_all(&mut buf).unwrap();
}

The function now accepts a parameter of the generic type W. The type is not completely arbitrary: the angle brackets between function name and parameter list tell that the type must implement the Write trait. Without this restriction, the compiler would not accept the call to write_all(...) because this function is provided by the trait.

Rust traits are a bit like interfaces in, e.g., Java and concepts in C++.4 They describe what conditions a type must fulfil. The Write trait describes things that bytes can be written to. So we can call this function with standard output:

In [17]:
write(&mut std::io::stdout(), b'#');
write(&mut std::io::stdout(), b'\n');
#

But we can also use other types which implement Write. This includes Vec<u8>, a vector of bytes. The written bytes will just be appended to the vector:5

In [18]:
let mut data = Vec::new();
write(&mut data, b'#');
write(&mut data, b'\n');
println!("data={:?}", data);

// We can also interpret the Vec<u8> as an UTF-8 string.
// Note that from_utf8(...) returns a Result because it
// would fail for input which is not valid UTF-8:
println!("data as str: {:?}", std::str::from_utf8(&data));
data=[35, 10]
data as str: Ok("#\n")

We can make the input that we read from generic in the same way by using the Read trait.

Our generic execution function then looks like this:

In [19]:
fn execute_v4<R: Read, W: Write>(tape: &mut Tape, instructions: &[Instruction], input: &mut R, output: &mut W) {
    for instruction in instructions {
        match instruction {
            Left => tape.left(),
            Right => tape.right(),
            Inc => tape.set(tape.get() + 1),
            Dec => tape.set(tape.get() - 1),
            Loop(body) => {
                while tape.get() != 0 {
                    execute_v4(tape, body, input, output)
                }
            }
            Read => {
                let mut buf: [u8; 1] = [0];
                input.read_exact(&mut buf).unwrap();
                tape.set(buf[0])
            }
            Write => {
                output.write(&[tape.get()]).unwrap();
            }
        }
    }
}

To test this execution function, we will use the countdown program that we saw earlier:

In [20]:
// 1. Read a byte value from standard input.
// 2. Write all numbers from this value down to zero to standard output.
let program_countdown = vec![
    Read,
    Loop(vec![Write, Dec]),
    Write
];

Given a byte with the value 10 as input, the program produces this output:6

In [21]:
{
    let ten = vec![10];
    let mut input: &[u8] = &ten;
    let mut output = Vec::new();
    let mut t = Tape::new();
    
    execute_v4(&mut t, &program_countdown, &mut input, &mut output);

    println!("remaining input: {:?}", input);
    println!("output: {:?}", output);
};
remaining input: []
output: [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

So it works as expected 🙂

Summary and outlook

We have combined the tape data structure and the instruction data type from the last post, and implemented a function that can execute the abstract syntax tree for any Brainfuck program in the context of the tape. Input and output were handled in a generic way using the Read and Write traits, such that program execution can be tested easily.

In the next post, we will implement a parser that transforms Brainfuck source code to an abstract syntax tree.


  1. The documentation for the Rust Jupyter kernel describes how to do this (search for "You can use the local work-in-progress crate like this").

  2. Note that Brainfuck programs usually operate on a tape which has all cells initialized to zero. So strictly speaking, a program that only sums the initial cell values does not make much sense. It is only used to test our intermediate step on the way to a full-featured Brainfuck execution engine here.

  3. This feature is called deref coercion.

  4. Our generic function write<W: Write>(...) is like a template in C++ in the sense that the compiler will generate separate assembly for each type that the function is used with. So our use of traits to define conditions that the type must fulfil corresponds more to concepts in C++ than to interfaces in Java. However, traits can also be used in a more dynamic way, such that only one version of the function exists in assembly and machine code, and dispatch is done dynamically at runtime with virtual function calls. This is more like what interfaces in Java and abstract base classes in C++ are used for.

    Here is an example function, which gets a so-called trait object as a parameter:

    fn write_trait_object(w: &mut dyn Write, value: u8) {
        let mut buf: [u8; 1] = [value];
        w.write_all(&mut buf).unwrap();
    }
    

    It can be called with standard output as the writer like this:

    write_trait_object(&mut std::io::stdout(), b'#');
    write_trait_object(&mut std::io::stdout(), b'\n');
    

    The differences to our earlier function write<W: Write>(...) are:

    • Only one version of the function exists for all types in machine code, so the size of the compiled program might be lower.
    • Due to dynamic dispatch at runtime, there may be a small performance penalty. Moreover, the compiler cannot perform optimizations for specific types that the function is used with. This could harm performance and increase the size of the compiled program.

    It can be tempting to use static dispatch and have the compiler generate optimal code for each type to improve the performance, but adding type annotations to functions and structs also has costs. In particular, it can harm developer productivity if done too excessively. The other day, I read a very interesting blog post about a refactoring to static dispatch that the author considered a mistake.

  5. Note that we do not have to state the type Vec<u8> explicitly when we create the Vec with Vec::new(). We could do it in two different ways:

    let mut data: Vec<u8> = Vec::new();
    

    or

    let mut data = Vec::<u8>::new();
    

    But the Rust compiler will deduce the type automatically here because we call the write(...) method of the Write trait on it, which Vec<T> only implements for T is u8.

  6. I have wrapped the code in this cell in braces ({...}) because compilation will fail with this error otherwise:

    Error: The variable input contains a reference with a non-static lifetime so can't be persisted. You can prevent this error by making sure that the variable goes out of scope - i.e. wrapping the code in {}.

    You would not have this problem outside a Jupyter notebook, because then input would have a well-define life time.

    Another way to fix this issue besides the braces would be to redefine the variable input:

    let input = 0;
    

    Then the old variable input would also go out of scope. In Rust, variable names can be reused in the same scope, which can be confusing at first. I found this feature quite useful though after I got used to it.

Comments