4.4 KiB
TAL Assembler
See also the UXN virtual machine
TAL is a Forth like two-pass assembler language translating directly to UXN memory images.
Words
Words are up to 63 consecutive non-whitespace characters.
For instance 0x75786E00
(ascii UXN\0) would be one TAL "word" although its value is many bytes.
foo
, bar-baz
and quix/qux
would all be examples of words.
Words starting with _
are defined to be relative references.
Words starting with ,
are
Comments
Comments in TAL are written ( ... )
and support nesting. Eg. ( () )
is a valid comment. ( ( )
is not.
TAL does not have a way to "close all start comments" like Java and some other languages do.
Includes
TAL files can include other files by writing ~<filename>
.
For instance the uxnasm.tal
file writes ~projects/library/string.tal
to include implementations of string functions.
As with other preprocessor and assembler languages, TAL does not support namespacing, renaming or selective importing.
- All included code is assembled at the point where it is included.
- TAL does not support multiple definition or idempotent includes, and will error on repeated or recursive inclusion.
Macros
Macros are sequences of instructions which may be repeated.
Macros are defined by writing %macro-name { ... }
.
The canonical UXNASM does not allow macros to exceed 64 words in size.
When macros are invoked by using the macro-name as a bare word, the contents of the macro will be inserted. Sub-macro references are supported and will be expanded with no recursion guards or limit.
Padding
|<number>
"pad-absolute" pads the resulting UXN rom to a given absolute address.
For instance |0x0000
would explicitly align the assembler's point to 0x0000
.
$<number>
"pad-relative" pads the UXN rom by the specified number of words (bytes).
For instance $2
would move the assembler's point forwards two words.
Labels
@<word>
defines a top-level label.
For instance @foo
would make the word foo
a valid symbol for use elsewhere.
Defining a top-level word establishes a scope within which sub-labels may be defined.
&bar
following @foo
would create the label foo/bar
.
This can be used to create semantic tables.
References
Labels may be referenced in one of seven ways:
- Literal byte zero-page -
.label
- Raw byte relative -
_label
- Literal byte relative -
,label
- Raw byte absolute -
-label
- Raw short absolute -
:label
or=label
- Literal short absolute -
;label
Literal labels are inserted with a LIT
or LIT2
as appropriate.
Raw labels are inserted directly into bytecode.
Absolute labels are double quantities. Relative labels are single signed byte quantities with a ±127 range.
The zero page (#00XX
) is used for system devices, along other things.
It's common to see labels such as .Mouse/state
, being a reference to the address #0096
.
However as UXN has a special LDZ
operation for loading from the zero page, this address can be specified as simply #96
to save a byte.
As the last device is mapped to #CX
, it is common to see #DX
, #EX
and #FX
used for program-global variables for ease of access.
Literal byte relative references ala ,foo
are used for control flow.
Using only a single byte, these references have a range of ±127 instructions.
A typical opcode sequences would be ,loop JMP
, eg. emit a relative address value to the loop label and perform a computed relative jump.
For bytecode compactness, UXN programs tend to use computed rather than absolute jumps.
The difference between single and double word references is critical, because the LDR
instruction is a computed relative load, whereas LDA
is an absolute short address load.
Brackets
[
and ]
are treated as whitespace, and may be used for visual grouping.
While they have semantics in traditional Forth, they have no semantics in TAL.
Literals
Hex constants are written #[0-9a-f]{1,4}
.
For instance #00
or #ffff
would be valid hex constants, the first assembling to one word, the second to two.
One and two byte literal quantities may also be provided without the #
prefix.
Words may be captured as ASCII formatted strings.
Such strings are written "<word>
.
For instance "foo
would cause the bytes #66 #6f #6f #00
to be literally inserted into the memory image.
As "
notation cannot capture whitespace, the #20
(space), #0a
(newline) and #09
(tab) character constants are common.