CS595: Virtual Machines

Overview

For this project, you will be completing the implementation of the Hawknest emulator. This emulator will be built around the 6502v processor, a version of the MOS 6502 CPU which I've modified slightly for our convenience. The v indicates that this CPU supports paravirtual extensions (namely, a new instruction).

The goals of this project are to:

Give you experience working with a fairly large C source tree
Help you understand how system emulation works
Refresh your understanding of ISAs and the internal operation of CPUs
Have fun!

Keep in mind that this project carries many of the imperfections of real-world code. If we encounter bugs or hiccups, we will work through them together.

Part 1: Getting Started

You will first need to download the Hawknest skeleton code. We will be working primarily with git, so make sure you are familiar with it and that you have it installed on your system. I would recommend working on this project either on a Linux machine or a Linux VM, ideally something you have sufficient permissions on to install software packages. To get the code, run the following in your VM:

$> git clone http://cs.iit.edu/~khale/class/vm-class/f18/hawknest-skeleton

This will fetch the code from a public git repo. You should see something like this:

$> cd hawknest-skeleton
$> ls
cfg  cscheme  emu  lib  Makefile  
mmc1.monopic  palette  roms  sim6502.lib  tags  test

If you try to build the project at this point, it will likely fail because you haven't installed the requisite libraries. See the next section for how to proceed.

Part 2: Building and Running Hawknest

This section will get you to the point where you can build and run the Hawknest emulator.

Prerequisites

Before we build Hawknest, we need to make sure that we have the required libraries installed on the system. These include:

SDL2 and its development headers
GNU Readline and its development headers
libasan and libubsan

All of these can be installed using your distro's package manager. For example, I'm on a Fedora system so I would use the dnf package manager to install them as follows:

$> sudo dnf install SDL2-devel readline-devel libasan libubsan

SDL2 provides window creation and rendering, while GNU Readline powers history and line editing in the Hawknest Shell. libasan and libubsan are required to compile Hawknest with various runtime checks that assist in bug-catching.

Building the Emulator

Once you have installed the prerequisites, you should be able to build the emulator as follows:

            
$> make
build/emu/gcc/debug/mos6502/vmcall.o <- emu/mos6502/vmcall.c
build/emu/gcc/debug/mos6502/mos6502-common.o <- emu/mos6502/mos6502-common.c
build/emu/gcc/debug/mos6502/mos6502.o <- emu/mos6502/mos6502.c
build/emu/gcc/debug/nes/io_reg.o <- emu/nes/io_reg.c
build/emu/gcc/debug/nes/nrom.o <- emu/nes/nrom.c
build/emu/gcc/debug/nes/sxrom.o <- emu/nes/sxrom.c
build/emu/gcc/debug/nes/mmc1.o <- emu/nes/mmc1.c
build/emu/gcc/debug/nes/ppu.o <- emu/nes/ppu.c
build/emu/gcc/debug/main.o <- emu/main.c
build/emu/gcc/debug/shell.o <- emu/shell.c
build/emu/gcc/debug/ines.o <- emu/ines.c
build/emu/gcc/debug/rc.o <- emu/rc.c
build/emu/gcc/debug/timekeeper.o <- emu/timekeeper.c
build/emu/gcc/debug/memory.o <- emu/memory.c
build/emu/gcc/debug/membus.o <- emu/membus.c
build/emu/gcc/debug/reset_manager.o <- emu/reset_manager.c
build/emu/gcc/debug/fileio.o <- emu/fileio.c

You should now have a binary called hawknest-gcc-debug in a newly-created bin/ directory. If you run it without any arguments (or with the -h flag), you will get a help message:

$> bin/hawknest-gcc-debug
Usage: bin/hawknest-gcc-debug [options] <rom-path>
Options:
  --palette or -p <path> : Use the NES palette at <path>
  --cscheme or -c <path> : Use the NES controller scheme at <path>
  --scale   or -s <int>  : Scale NES output by <int>
  --help    or -h        : Print this message
  --version or -V        : Print version information

Building ROMs (test code)

I've supplied you with a couple simple test programs in the test/ directory in the main source tree: hello.c and primes.c. hello.c tests writing a “Hello World” message to stdout using the paravirtual interface, while primes.c goes a step further to implement an interactive prime-number-checker. You can compile both of these to Hawknest ROMs by making the tests target in the project root:

            
$> make tests
build/test/hello.s <- test/hello.c
build/test/hello.o <- build/test/hello.s
build/lib/intr.s <- lib/intr.c
build/lib/intr.o <- build/lib/intr.s
build/lib/crt0.o <- lib/crt0.s
build/lib/ctype.o <- lib/ctype.s
build/lib/exehdr.o <- lib/exehdr.s
build/lib/mainargs.o <- lib/mainargs.s
build/lib/paravirt.o <- lib/paravirt.s
Linking bin/hawknest.lib...
Linking bin/hello
build/test/malloc.s <- test/malloc.c
build/test/malloc.o <- build/test/malloc.s
Linking bin/malloc
build/test/memcpy.s <- test/memcpy.c
build/test/memcpy.o <- build/test/memcpy.s
Linking bin/memcpy

You'll also find a modules.mk file in the test/ directory, which informs the Makefile of the test files to compile. If you want to write your own test, just put the source file in test/ and add it to the modules.mk file; both C (.c) and 6502 assembly (.s) files will work. Behind the scenes, the Makefile invokes cc65, ca65, and ld65, just as in your preliminary project, but using a custom library (bin/hawknest.lib, built from the sources in lib/) and a custom linker script (cfg/hawknest.cfg).

Running Hawknest

Once you have your compiled ROM files, you can run them in Hawknest. For example, we could run the ROM file for the hello.c test like this:

$> bin/hawknest-gcc-debug bin/hello

Notice that nothing happens: this is because there's no CPU to run the program! You will implement this, and when you do, you'll start to see some output here. You can also start Hawknest in an interactive shell, which is useful for debugging purposes:

$> bin/hawknest-gcc-debug -i bin/hello
$(hawknest-shell)>

Here you can type commands into the interactive shell, but aside from the memory peek, poke, and dump commands, they won't do anything. The memory commands will work just fine since I've already implemented the memory controller, RAM, and ROM subsystems for you. Note that you can also enter the shell for a running ROM, even if you didn't pass the -i argument to the Hawknest executable: just send a SIGINT (mapped to Ctrl-C in most terminal emulators) to Hawknest to pause the CPU and open the shell. This also applies for stopping continue or step commands early.

While working on your project, you may want to have debugging prints that only appear when you are debugging your CPU. I've provided a macro for you to achieve this. The macro is called DEBUG_PRINT. It works similarly to printf, but automatically appends a \n to each printout, and includes the filename and line number of the printout itself. Additionally, it will only produce output when the code is compiled for debugging (which is the default). See the appendix section for additional Hawknest compilation options.

Working with Instrumentation

By default, Hawknest is compiled with embedded runtime instrumentation, namely UndefinedBehaviorSanitizer and AddressSanitizer. The former detects and reports many instances of undefined behavior, and is always non-fatal. This is to say, UndefinedBehaviorSanitizer will never halt a program because it detects an instance of undefined behavior.

AddressSanitizer, on the other hand, is always fatal when it detects errors (mostly invalid memory access). Its checks are useful, it’s, but if it becomes a nuisance, you can disable it at build-time like this:

$> make NO_ASAN=1

On Linux, AddressSanitizer also performs a memory-leak check at program termination. This can be a bit noisy and annoying, as it includes leaks from within external libraries (e.g. SDL), and often fails to track down the origin of a given leak. Unfortunately, there’s no way to disable this at build-time without disabling all of AddressSanitizer, but it can be disabled at execution time by setting the ASAN_OPTIONS environment variable appropriately; in bash-compatible shells that looks like this:

$> ASAN_OPTIONS=detect_leaks=0 bin/hawknest-gcc-debug <rom_path>

This instrumentation is enabled for the entire Hawknest, not just the part you’re working on, so if you see an error report in code you didn’t write, let me know!

Part 3: Implementation

This section will outline some strategies for building your 6502v with readable code.

Strategy

The skeleton has been set up such that only emu/mos6502/mos6502.c should need to be modified to build the CPU. There's much more detailed information in that file, but the gist is that you'll need to fill in mos6502_step(...) such that it steps the CPU forward by one instruction. A few helper routines for reading and writing memory are already provided, along with some coarse cycle-counting. While you could write the entire CPU implementation inside mos6502_step, it's recommended that you add additional functions liberally.

If you're the kind of person that likes to split things into multiple files, the Makefile is built to accommodate that. If you place an additional source file in emu/mos6502/ and add it to emu/mos6502/modules.mk, it'll automatically be compiled and linked into Hawknest when you run make. Adding a header file to emu/include/mos6502/ requires no extra steps; anything in emu/include/ is immediately available for inclusion by source files. The Makefile also understands source and header dependencies, so you don't need to make clean when you edit a header file; just make like usual.

Suggested Tools

When navigating and working with a large codebase, there are several tools you might want to get familiar with. The first is ctags. This is a utility which creates an index of all the functions and symbols in your source tree, which you can then use to navigate easily within a text editor like vi.

For example, to use ctags for this project, in the Hawknest directory, run the following (you may have to install the ctags package):

$> ctags -R

This will create a tags file in this directory. To use this in vi or vim, add the following line to your .vimrc file in your home directory:

tags=tags;/

Now, within vi or vim, if you hover over a symbol that you want to see the definition of, e.g. a function, you can navigate to it simply by pressing ctrl + ]. You can go back to where you were before by typing ctrl + t. Emacs has similar capabilities.

Other tools that might be useful:

gdb / lldb - both interactive debuggers for compiled C/C++ programs
grep / ack / the_silver_searcher - all tools that search for text strings in a source tree (in order from slowest to fastest). You can also use plugins like Ctrl-p for your editor to do interactive searching/fuzzy finding
tmux - useful for having multiple terminal windows open at once
tig - command-line graphical interface for git repos

Updates

As we progress on this project, I may be making updates to the sections of code that you are not working on (e.g. graphics code, interrupt handling, libraries etc.). If I do make updates, I will ask you to get them from the repo using:

$> git pull

Because of this, I would advise against adding code to parts of the system outside of emu/mos6502/mos6502.c unless absolutely necessary.

Handin

Once you're convinced that your 6502v is complete, you will hand in your code as a git patch or as a set of git patches. In order for this to work, you want to commit changes to your code in logical chunks as you go. DO NOT just hand me one big commit containing all your code. Your commits should be clearly described with a log message, and should clearly show authorship. Here is how I make a commit:

...add stuff to git index using git add...
$> git commit --author="Kyle Hale <khale@cs.iit.edu>"

Then include a descriptive line such as “Added implementation of JMP instruction.”

When you're finished, you can turn your commits into patches using git format-patch. This will produce several .patch files which you should be able to group into a tarball, which you can then send to me directly.

Note that the due date for this assignment is Thursday, October 4 at 11:59PM.

Appendix 1: Configuring the Build Process

By default, invoking make compiles Hawknest using GCC in a “debug” configuration, but this is not the only way Hawknest can be built. Hawknest can be compiled using either Clang or GCC (with custom executables for either), and has three available compilation “modes”: DEBUG, OPT, and RELEASE. These are set using the COMPILER and MODE variables, so to compile with Clang in OPT mode:

make COMPILER=CLANG MODE=OPT

The build artifacts and final binaries are stored separately for each mode, so they will never conflict. That said, only one COMPILER and MODE can be used for a given make invocation. The name of the final binary reflects these compilation options, so the above invocation would produce bin/hawknest-clang-opt.

Modes

The compilation mode has the most pronounced effect on the behavior and performance of the final executable.

DEBUG mode is optimized for debugging. It's the only mode under which the DEBUG_PRINT macro is enabled, and it compiles Hawknest with a couple pieces of runtime instrumentation (ubsan and asan), which serve to catch common cases of undefined behavior or out-of-bounds memory accesses. DEBUG mode also enables a number of internal sanity checks within the skeleton, so if you manage to trip something outside of mos6502.c, let me know! Finally, DEBUG mode avoids compiler optimizations that would interfere with the use of an interactive debugger like gdb (namely inlining).
OPT keeps all the runtime instrumentation and sanity checks of DEBUG mode, but disables DEBUG_PRINT and enables almost all compiler optimizations. This makes it substantially less useful if you like an interactive debugger, but it also often runs substantially faster.
RELEASE mode takes the brakes off completely, dropping all runtime instrumentation and enabling link-time optimization. It's pretty useless for debugging, but it may be required to run NES games at full speed if you have a slower machine.

Compilers

The difference between compilers is not particularly pronounced outside of the diagnostics they produce, but if you prefer using Clang over GCC, you can specify the COMPILER variable:

make COMPILER=CLANG

It also may be the case that your compiler is not available with the usual name; you can override the command invoked for a particular compiler by specifying the corresponding EXEC variable, e.g.

make GCC_EXEC=gcc-8

This works for specifying the CC65 toolchain executables as well, so if they're not in your PATH, you can specify them manually:

make CC65_EXEC=path/to/cc65 AR65_EXEC=path/to/ar65 CA65_EXEC=path/to/ca65 LD65_EXEC=path/to/ld65

If all the CC65 executables are in the same directory, the above can be shortened to:

make CC65_PREFIX=path/to/cc65/prefix/dir

Appendix 2: Other Potentially-Useful Macros

Internally, Hawknest uses ASSERT and UNREACHABLE macros for sanity checks, and these are exposed to you as well through base.h. While you're certainly not required to use them, you may find them useful nonetheless.

The ASSERT macro is very similar to the assert macro in the C standard library. It's used as a function that accepts a single boolean value, e.g.:

ASSERT(x < 7)

If that expression evaluates to a logical false, a message documenting the exact failure is printed to stderr, but execution otherwise continues. This is to say, ASSERT is never fatal.

The UNREACHABLE macro is used to mark points in code that should not be possible to reach during execution (typically default labels in exhaustive switch statements), and is also used like a function (e.g. UNREACHABLE()), despite taking no arguments. When an UNREACHABLE() statement is executed, it has very similar behavior to an assertion failure; it’s also non-fatal.

Both UNREACHABLE and ASSERT are disabled when compiling in RELEASE mode, but otherwise stay active in DEBUG and OPT mode.

CS 595: Project 1: Hawknest 6502v Emulator