descriptionA toy 8-bit ISA ("instruction set architecture"), and some tools to go with it.
last changeWed, 12 Feb 2020 02:53:31 +0000 (18:53 -0800)
readme

ROSE-8 v0.2.2

A toy 8-bit ISA ("instruction set architecture"), and some tools to go with it. Probably horrible to write real programs in.

Features

Some dubiously "nice" properties:

Getting Started

The rose8-as tool in this package can interpret or assemble ROSE-8 assembly.

% swift run rose8-as hello.rose8

See below for the syntax supported by the assembler.

Hacking on ROSE-8

The actual architecture code is organized as follows:

And the assembler code:

If you want to add an instruction, check out the encoding table later in this file, and then find all the places an existing instruction is referenced and do that.

Background

It all started with my colleague Cassie having fun designing a toy 8-bit ISA. I love encoding tables (I helped out a little with the one for Swift's String struct representation), and I did assignments in college involving simplified CPUs. So I started thinking about what it would be like to write a program in Cassie's ISA...and decided its four registers were too limited for me. How could I get up to 8 registers while still keeping most of the instructions in a single byte?

The answer has actually been around for a long time: make a special register called an accumulator. In some architectures, that means giving one regular register special privileges, including the Intel 8080 that's an ancestor of most modern desktops. But I chose to make the accumulator be its own thing, colloquially called "it". This name comes from programming environments that use this to reference the last thing you accessed; the oldest one I've used is HyperTalk and I've seen from LOLCODE that it's reasonable to have this be part of how you program.

With the accumulator as the implicit target of most operations, it was "easy" to have eight registers available and still have room for a large number of operations. I took some other hints from Cassie's ISA, like the compact encoding for bitstring immediate operations (see BITI below) based on the intuition that adding or subtracting large numbers known at compile-time isn't very common.

The next big challenge was memory. (Cassie realized this too.) With 8-bit registers, you'd only be able to address 256 bytes of memory, which...isn't a ton. I took another hint from old Intel machines by using segment registers: accessing address 0x55 means something different based on which segment you're accessing it in. Around this point I also realized this felt pretty familiar...

...and realized that the last 8-bit machine I heard about was the Game Boy, via Eevee's series about writing a Game Boy Color game, which I very much enjoyed / am enjoying. With that consciously realized, I got to check what I was doing against the list of Game Boy opcodes (provided by Randy Mongenel) to make sure I wasn't making any stupid mistakes making any mistakes that Nintendo hadn't, at least not accidentally.

The last "clever bit" of this architecture (it's called ROSE-8 because I'm very imaginative) is how to do function calls. If you want more than 256 bytes of code, some of it is going to be too far away to refer to with an 8-bit register. It's the segment problem again! So there's a "code" segment for "normal" function calls...but since that means a fair amount of overhead getting too and from a far-away function, there's also support for "offset" function calls. Encoding jumps by offset (from the current instruction) is pretty standard practice for jumping around in a function, but unusual for calls, because the thing you're calling has to know how to get back. I'm still not sure if I think this is worth it, but I haven't really written enough big or even medium-sized programs to know yet.

(Also, there's no dedicated RET instruction, or even a "jump-absolute-to-it" or "jump-offset-by-it" instruction. To return from a function, you "call" the return address with CABA or COFA.)

Future Directions

The instructions and their encodings

7654_3210

0000_0000 STOP halts execution, like an invalid instruction
     0001 NOPE "no operation" (with a more fun mnemonic)
     0010 PRNT print it (for debugging or toy programs)
     0011 WAIT spin or sleep until data1[it] > 0, then decrement data1[it] (for MMIO)
     0100 CABL call absolute code[link], link <- return addr, it <- return addr (deprecated), code <- return segment
     0101
     0110 CABA call absolute code[it], link <- return addr, it <- return addr (deprecated), code <- return segment
     0111 COFA call offset pc ± it, link <- return offset (-it + 1), it <- return offset (deprecated)

0000_1ooo ALU1 it <- {zero, lsl1, lsr1, asr1, incr, decr, comp, negt} it

0001_0000 GET1 it <- data1
     0001 GET2 it <- data2
     0010 GETC it <- code
     0011 GETL it <- link
     01xx
     1000 SET1 data1 <- it
     1001 SET2 data2 <- it
     1010 SETC code  <- it
     1011 SETL link  <- it
     11xx

0010_0aaa GETR it <- it
   0_1aaa SETR ra <- it
   1_0aaa SWAP ra <- it, it <- ra
   1_1aaa ISLT "is less than", for testing overflow / carries: it <- (it < ra) ? 1 : 0

01oo_oaaa ALUR it <- it {addr, subr, andr, iorr, xorr, lslr, lsrr, asrr} ra

1000_0aaa LD1R it <- data1[ra]
  00_1aaa ST1R data1[ra] <- it
  01_0aaa LD1U it <- data1[ra], then ra += 1
  01_1aaa ST1U data1[ra] <- it, then ra += 1
  10_0aaa LD2R it <- data2[ra]
  10_1aaa ST2R data2[ra] <- it
  11_0aaa LD2U it <- data2[ra], then ra += 1
  11_1aaa ST2U data2[ra] <- it, then ra += 1

110x_xaaa               (reserved w/ register)

1110_0aaa LD2D ra -= 1, then it <- data2[ra]
1110_1aaa ST2D ra -= 1, then data2[ra] <- it

1111_00oo iiiiiiii ALUI it <- it {andi, iori, xori, (see below)} i
     0011 0iiiiiii ADDI it <- it + i
     0011 10oooiii BITI it <- it {roli, lsli, lsri, asri, (clri), (insi), (togi), exti} i
     0011 11iiiiii ADDI it <- it + (whole field, thus allowing many negative numbers)
     0100 iiiiiiii BEZI branch pc ± i if it == 0
     0101 iiiiiiii JOFI jump offset to pc ± i
     0110 iiiiiiii CABI call absolute code[i], link <- return addr, it <- return addr (deprecated), code <- return segment
     0111 iiiiiiii COFI call offset pc ± i, link <- return offset (-i + 2), it <- return offset (deprecated)
     10xx iiiiiiii      (reserved w/ immediate)
     110x iiiiiiii      (reserved w/ immediate)
     1110 iiiiiiii GETI it <- i
     1111 xxxxxxxx EXT1 extended encoding for "future-proofing"

Assembler syntax

The instructions above have one of the following forms:

Integer values can be represented by integer literals (decimal, hex with a leading 0x, or binary with a leading 0b), as well as by character literals (single quote followed by an ASCII character or backslash escape). Some immediate instructions can take labels as well as integer values, like BEZI, JOFI, CABI, and COFI.

In addition to the instructions above, the assembler supports a handful of pseudo-instructions for computing addresses and other conveniences:

It also supports several directives that affect how the program is assembled.

Labels are defined with a leading colon: :loop.

A # character marks the rest of a line as a comment.

Note that the assembler syntax should not be considered part of the ISA spec. The ISA only defines the behavior of assembled machine code, not how that code is generated.

ISA Version History

0.2.3 (working)

0.2.2

0.2.1

0.2

0.1

Initial design, mostly just to show Cassie what was possible. I always intended to keep iterating on it.

shortlog
2020-02-12 Jordan RoseAdd a '.nosplit' directive dev
2020-02-08 Jordan RoseMark v0.2.3 as in progress, not complete
2020-02-08 Jordan RoseImplement the link-updating logic of calls
2020-02-08 Jordan RoseImplement CABL as a clone of CABA
2020-02-08 Jordan RoseImplement GETL and SETL
2020-02-08 Jordan RoseAdd link register (GETL, SETL, CABL)
2020-02-08 Jordan RoseAdd end-to-end tests for new pseudo-instructions
2020-02-07 Jordan RoseSketch out end-to-end "expected output" tests
2020-02-07 Jordan RoseFix Linux tests
2020-02-07 Jordan RoseAdd `SUBI_ value`, an alias for `ADDI -value`
2020-02-07 Jordan RoseAdd "ANDIN", "ANDI NOT"
2020-02-07 Jordan RoseAdd XORIS and XORIA for comparing against addresses
2020-02-07 Jordan RoseAdd .seg, .addr, .offset directives
2020-02-07 Jordan Roserose8-as: Support `-o -` for dumping to stdout
2020-02-04 Jordan RoseAdd GameByColor to .gitignore
2020-02-04 Jordan RoseFix wraparound behavior for LD1U, and add a test.
...
tags
4 weeks ago ISA-v0.2.2 https://belkadan.com/blog/2020...
5 weeks ago ISA-v0.2.1 https://belkadan.com/blog/2020...
heads
2 weeks ago dev
4 weeks ago proposals/CABL