uxn/doc/tal.md

5.6 KiB

TAL Assembler

See also the UXN virtual machine

TAL is a Forth like two-pass assembler language translating directly to UXN memory images.

Words

Words are up to 63 consecutive non-whitespace characters. For instance loop, System, Mouse/x, my-routine and some_other_routine would all be examples of words. The UXN instructions themselves (ADD, POP, LIT and soforth) are all words.

Some words have special interpretations.

Opcodes

See the UXN documentation for a full listing of opcodes, but BRK, INC, POP, NIP, SWP .... SFT as words all mean their respective opcodes. These opcodes may be followed with the flags k, r or 2 to set the keep, return and short flags. For instance INC2 as a word would increment a two-bit quantity at the top of the stack. INC2k would keep the original value, resulting in x x+1 as the stack values.

Numbers

Hexadecimal numbers written with either two or four digits. For instance 00 would be the single word 0x00. 0000 is equivalent to the two words 00 00. UXN is little-endian, the value 0xFF00 is represented as the sequential words FF 00.

To disambiguate, numbers are usually prefixed with #.

Strings

Words may be captured as ASCII formatted strings. Such strings are written "<word>. For instance "foo would cause the bytes #66 #6f #6f #00 to be literally inserted into the memory image.

As " notation cannot capture whitespace, the #20 (space), #0a (newline) and #09 (tab) character constants are common.

Comments

Comments in TAL are written ( ... ) and support nesting. Eg. ( () ) is a valid comment. ( ( ) is not. TAL does not have a way to "close all start comments" like Java and some other languages do.

Brackets

[ and ] are treated as whitespace, and may be used for visual grouping. While they have semantics in traditional Forth, they have no semantics in TAL.

Assembler directives

Padding

|<number> "pad-absolute" pads the resulting UXN rom to a given absolute address. For instance |0x0000 would explicitly align the assembler's point to 0x0000.

$<number> "pad-relative" pads the UXN rom by the specified number of words (bytes). For instance $2 would move the assembler's point forwards two words.

Labels

@<word> defines a top-level label. For instance @foo would make the word foo a valid symbol for use elsewhere. Defining a top-level word establishes a scope within which sub-labels may be defined.

&bar following @foo would create the label foo/bar. This can be used to create semantic tables.

Numbers and opcodes cannot be created as labels.

Example - the system device

|00 @System     &vector $2 &wst      $1 &rst    $1 &eaddr  $2 &ecode  $1 &pad     $1 &r       $2 &g      $2 &b     $2 &debug  $1 &halt $1

|00 aligns the assembler to 0x0000.

This line of code creates the following symbols:

  • System at 0x0000
  • Sytstem/vector at 0x0000
  • System/wst at 0x0002, shifted from System/vector by the $2
  • System/rst at 0x0003
  • System/eaddr at 0x0004
  • System/ecode at 0x0005
  • System/pad at 0x0006
  • System/r at 0x0008
  • System/g at 0x000a
  • System/b at 0x000c
  • System/debug at 0x000e
  • System/halt at 0x000f

Label References

Labels may be referenced in one of seven ways:

  • Literal byte zero-page - .label
  • Raw byte relative - _label
  • Literal byte relative - ,label
  • Raw byte absolute - -label
  • Raw short absolute - :label or =label
  • Literal short absolute - ;label

Literal labels are inserted with a LIT or LIT2 as appropriate. Raw labels are inserted directly into bytecode.

Absolute labels are double quantities. Relative labels are single signed byte quantities with a ±127 range.

The zero page (#00XX) is used for system devices, along other things. It's common to see labels such as .System/vector, being a reference to the address #0000 packed into just #00 However as UXN has a special LDZ operation for loading from the zero page, this address can be specified as simply #96 to save a byte. As the last device is mapped to #CX, it is common to see #DX, #EX and #FX used for program-global variables for ease of access.

Literal byte relative references ala ,foo are used for control flow. Using only a single byte, these references have a range of ±127 instructions. A typical opcode sequences would be ,loop JMP, eg. emit a relative address value to the loop label and perform a computed relative jump. For bytecode compactness, UXN programs tend to use computed rather than absolute jumps.

The difference between single and double word references is critical, because the LDR instruction is a computed relative load, whereas LDA is an absolute short address load.

Includes

TAL files can include other files by writing ~<filename>. For instance the uxnasm.tal file writes ~projects/library/string.tal to include implementations of string functions. As with other preprocessor and assembler languages, TAL does not support namespacing, renaming or selective importing.

  • All included code is assembled at the point where it is included.
  • TAL does not support multiple definition or idempotent includes, and will error on repeated or recursive inclusion.

Macros

Macros are sequences of instructions which may be repeated. Macros are defined by writing %macro-name { ... }. The canonical UXNASM does not allow macros to exceed 64 words in size.

When macros are invoked by using the macro-name as a bare word, the contents of the macro will be inserted. Sub-macro references are supported and will be expanded with no recursion guards or limit.