uxn/doc/tal.md

5 KiB

TAL Assembler

See also the UXN virtual machine

TAL is a Forth like two-pass assembler language translating directly to UXN memory images.

Words

Words are up to 63 consecutive non-whitespace characters. For instance 0x75786E00 (ascii UXN\0) would be one TAL "word" although its value is many bytes. foo, bar-baz and quix/qux would all be examples of words.

Words starting with _ are defined to be relative references. Words starting with , are

Comments

Comments in TAL are written ( ... ) and support nesting. Eg. ( () ) is a valid comment. ( ( ) is not. TAL does not have a way to "close all start comments" like Java and some other languages do.

Literals

Hex constants are written #[0-9a-f]{1,4}. For instance #00 or #ffff would be valid hex constants, the first assembling to one word, the second to two. One and two byte literal quantities may also be provided without the # prefix.

Words may be captured as ASCII formatted strings. Such strings are written "<word>. For instance "foo would cause the bytes #66 #6f #6f #00 to be literally inserted into the memory image.

As " notation cannot capture whitespace, the #20 (space), #0a (newline) and #09 (tab) character constants are common.

Brackets

[ and ] are treated as whitespace, and may be used for visual grouping. While they have semantics in traditional Forth, they have no semantics in TAL.

Padding

|<number> "pad-absolute" pads the resulting UXN rom to a given absolute address. For instance |0x0000 would explicitly align the assembler's point to 0x0000.

$<number> "pad-relative" pads the UXN rom by the specified number of words (bytes). For instance $2 would move the assembler's point forwards two words.

Labels

@<word> defines a top-level label. For instance @foo would make the word foo a valid symbol for use elsewhere. Defining a top-level word establishes a scope within which sub-labels may be defined.

&bar following @foo would create the label foo/bar. This can be used to create semantic tables.

Example - the system device

|00 @System     &vector $2 &wst      $1 &rst    $1 &eaddr  $2 &ecode  $1 &pad     $1 &r       $2 &g      $2 &b     $2 &debug  $1 &halt $1

|00 aligns the assembler to 0x0000.

This line of code creates the following symbols:

  • System at 0x0000
  • Sytstem/vector at 0x0000
  • System/wst at 0x0002, shifted from System/vector by the $2
  • System/rst at 0x0003
  • System/eaddr at 0x0004
  • System/ecode at 0x0005
  • System/pad at 0x0006
  • System/r at 0x0008
  • System/g at 0x000a
  • System/b at 0x000c
  • System/debug at 0x000e
  • System/halt at 0x000f

References

Labels may be referenced in one of seven ways:

  • Literal byte zero-page - .label
  • Raw byte relative - _label
  • Literal byte relative - ,label
  • Raw byte absolute - -label
  • Raw short absolute - :label or =label
  • Literal short absolute - ;label

Literal labels are inserted with a LIT or LIT2 as appropriate. Raw labels are inserted directly into bytecode.

Absolute labels are double quantities. Relative labels are single signed byte quantities with a ±127 range.

The zero page (#00XX) is used for system devices, along other things. It's common to see labels such as .System/vector, being a reference to the address #0000 packed into just #00 However as UXN has a special LDZ operation for loading from the zero page, this address can be specified as simply #96 to save a byte. As the last device is mapped to #CX, it is common to see #DX, #EX and #FX used for program-global variables for ease of access.

Literal byte relative references ala ,foo are used for control flow. Using only a single byte, these references have a range of ±127 instructions. A typical opcode sequences would be ,loop JMP, eg. emit a relative address value to the loop label and perform a computed relative jump. For bytecode compactness, UXN programs tend to use computed rather than absolute jumps.

The difference between single and double word references is critical, because the LDR instruction is a computed relative load, whereas LDA is an absolute short address load.

Includes

TAL files can include other files by writing ~<filename>. For instance the uxnasm.tal file writes ~projects/library/string.tal to include implementations of string functions. As with other preprocessor and assembler languages, TAL does not support namespacing, renaming or selective importing.

  • All included code is assembled at the point where it is included.
  • TAL does not support multiple definition or idempotent includes, and will error on repeated or recursive inclusion.

Macros

Macros are sequences of instructions which may be repeated. Macros are defined by writing %macro-name { ... }. The canonical UXNASM does not allow macros to exceed 64 words in size.

When macros are invoked by using the macro-name as a bare word, the contents of the macro will be inserted. Sub-macro references are supported and will be expanded with no recursion guards or limit.