# TAL Assembler See also [the UXN virtual machine](./uxn.md) TAL is a Forth like two-pass assembler language translating directly to UXN memory images. ## Words Words are up to 63 consecutive non-whitespace characters. For instance `0x75786E00` (ascii UXN\0) would be one TAL "word" although its value is many bytes. `foo`, `bar-baz` and `quix/qux` would all be examples of words. Words starting with `_` are defined to be relative references. Words starting with `,` are ## Comments Comments in TAL are written `( ... )` and support nesting. Eg. `( () )` is a valid comment. `( ( )` is not. TAL does not have a way to "close all start comments" like Java and some other languages do. ## Includes TAL files can include other files by writing `~`. For instance the `uxnasm.tal` file writes `~projects/library/string.tal` to include implementations of string functions. As with other preprocessor and assembler languages, TAL does not support namespacing, renaming or selective importing. - All included code is assembled at the point where it is included. - TAL does not support multiple definition or idempotent includes, and will error on repeated or recursive inclusion. ## Macros Macros are sequences of instructions which may be repeated. Macros are defined by writing `%macro-name { ... }`. The canonical UXNASM does not allow macros to exceed 64 words in size. When macros are invoked by using the macro-name as a bare word, the contents of the macro will be inserted. Sub-macro references are supported and will be expanded with no recursion guards or limit. ## Padding `|` "pad-absolute" pads the resulting UXN rom to a given absolute address. For instance `|0x0000` would explicitly align the assembler's point to `0x0000`. `$` "pad-relative" pads the UXN rom by the specified number of words (bytes). For instance `$2` would move the assembler's point forwards two words. ## Labels `@` defines a top-level label. For instance `@foo` would make the word `foo` a valid symbol for use elsewhere. Defining a top-level word establishes a scope within which sub-labels may be defined. `&bar` following `@foo` would create the label `foo/bar`. This can be used to create semantic tables. ## References Labels may be referenced in one of seven ways: - Literal byte zero-page - `.label` - Raw byte relative - `_label` - Literal byte relative - `,label` - Raw byte absolute - `-label` - Raw short absolute - `:label` or `=label` - Literal short absolute - `;label` Literal labels are inserted with a `LIT` or `LIT2` as appropriate. Raw labels are inserted directly into bytecode. Absolute labels are double quantities. Relative labels are single signed byte quantities with a ±127 range. The zero page (`#00XX`) is used for system devices, along other things. It's common to see labels such as `.Mouse/state`, being a reference to the address `#0096`. However as UXN has a special `LDZ` operation for loading from the zero page, this address can be specified as simply `#96` to save a byte. As the last device is mapped to `#CX`, it is common to see `#DX`, `#EX` and `#FX` used for program-global variables for ease of access. Literal byte relative references ala `,foo` are used for control flow. Using only a single byte, these references have a range of ±127 instructions. A typical opcode sequences would be `,loop JMP`, eg. emit a relative address value to the loop label and perform a computed relative jump. For bytecode compactness, UXN programs tend to use computed rather than absolute jumps. The difference between single and double word references is critical, because the `LDR` instruction is a computed relative load, whereas `LDA` is an absolute short address load. ## Brackets `[` and `]` are treated as whitespace, and may be used for visual grouping. While they have semantics in traditional Forth, they have no semantics in TAL. ## Literals Hex constants are written `#[0-9a-f]{1,4}`. For instance `#00` or `#ffff` would be valid hex constants, the first assembling to one word, the second to two. One and two byte literal quantities may also be provided without the `#` prefix. Words may be captured as ASCII formatted strings. Such strings are written `"`. For instance `"foo` would cause the bytes `#66 #6f #6f #00` to be literally inserted into the memory image. As `"` notation cannot capture whitespace, the `#20` (space), `#0a` (newline) and `#09` (tab) character constants are common.