Import the Calf project

This commit is contained in:
Reid 'arrdem' McKenzie 2021-04-09 01:16:16 -06:00
parent c25e825a95
commit 9feb262454
27 changed files with 1699 additions and 0 deletions

10
projects/calf/BUILD Normal file
View file

@ -0,0 +1,10 @@
package(default_visibility = ["//visibility:public"])
py_library(
name = "lib",
srcs = glob(["src/**/*.py"]),
imports = ["src"],
deps = [
py_requirement("pyrsistent"),
]
)

4
projects/calf/NOTES.md Normal file
View file

@ -0,0 +1,4 @@
# Notes
https://github.com/Pyrlang/Pyrlang
https://en.wikipedia.org/wiki/Single_system_image

56
projects/calf/README.md Normal file
View file

@ -0,0 +1,56 @@
# Calf
> Calf: Noun.
A young cow or ox.
Before I walked from the Clojure space, I kept throwing around the idea of "ox", an ur-clojure.
Ox was supposed to experiment with some stuff around immutable namespaces and code as data which never came to fruition.
I found the JVM environment burdensome, difficult to maintain velocity in, and my own ideas too un-formed to fit well into a rigorous object model.
Calf is a testbed.
It's supposed to be a lightweight, unstable, easy for me to hack on substrate for exploring those old ideas and some new ones.
Particularly I'm interested in:
- compilers-as-databases (or using databases)
- stream processing and process models of computation more akin to Erlang
- reliability sensitive programming models (failure, recovery, process supervision)
I previously [blogged a bit](https://www.arrdem.com/2019/04/01/the_silver_tower/) about some ideas for what this could look like.
I'm convinced that a programming environment based around [virtual resiliency](https://www.microsoft.com/en-us/research/publication/a-m-b-r-o-s-i-a-providing-performant-virtual-resiliency-for-distributed-applications/) is a worthwhile goal (having independently invented it) and worth trying to bring to a mainstream general purpose platform like Python.
## Manifesto
In the last decade, immutability has been affirmed in the programming mainstream as an effective tool for making programs and state more manageable, and one which has been repeatedly implemented at acceptable performance costs.
Especially in messaging based rather than state sharing environments, immutability and "data" oriented programming is becoming more and more common.
It also seems that much of the industry is moving towards message based reactive or network based connective systems.
Microservices seem to have won, and functions-as-a-service seem to be a rising trend reflecting a desire to offload or avoid deployment management rather than wrangle stateful services.
In these environments, programs begin to consist entirely of messaging with other programs over shared channels such as traditional HTTP or other RPC tools or message buses such as Kafka, gRPC, ThriftMux and soforth.
Key challenges with these connective services are:
- How they handle failure
- How they achieve reliability
- The ergonomic difficulties of building and deploying connective programs
- The operational difficulties of managing N-many 'reliable' services
Tools like Argo, Airflow and the like begin to talk about such networked or evented programs as DAGs; providing schedulers for sequencing actions and executors for performing actions.
Airflow provides a programmable Python scheduler environment, but fails to provide an execution isolation boundary (such as a container or other subprocess/`fork()` boundary) allowing users to bring their own dependencies.
Instead Airflow users must build custom Airflow packagings which bundle dependencies into the Airflow instance.
This means that Airflow deployments can only be centralized with difficulty due to shared dependencies and disparate dependency lifecycles and limits the return on investment of the platform by increasing operational burden.
Argo ducks this mistake, providing a robust scheduler and leveraging k8s for its executor.
This allows Argo to be managed independently of any of the workloads it manages - a huge step forwards over Airflow - but this comes at considerable ergonomic costs for trivial tasks and provides a more limited scheduler.
Previously I developed a system which provided a much stronger DSL than Airflow's, but made the same key mistake of not decoupling execution from the scheduler/coordinator.
Calf is a sketch of a programming language and system with a nearly fully featured DSL, and decoupling between scheduling (control flow of programs) and execution of "terminal" actions.
In short, think a Py-Lisp where instead of doing FFI directly to the parent Python instance you do FFI by enqueuing a (potentially retryable!) request onto a shared cluster message bus, from which subscriber worker processes elsewhere provide request/response handling.
One could reasonably accuse this project of being an attempt to unify Erlang and a hosted Python to build a "BASH for distsys" tool while providing a multi-tenant execution platform that can be centrally managed.
## License
Copyright Reid 'arrdem' McKenzie, 3/5/2017.
Distributed under the terms of the MIT license.
See the included `LICENSE` file for more.

4
projects/calf/pytest.ini Normal file
View file

@ -0,0 +1,4 @@
[pytest]
python_files=test_*.py
python_classes=Check
python_functions=test_*

45
projects/calf/setup.py Normal file
View file

@ -0,0 +1,45 @@
#!/usr/bin/env python
from os import path
from setuptools import setup, find_namespace_packages
# Fetch the README contents
rootdir = path.abspath(path.dirname(__file__))
with open(path.join(rootdir, "README.md"), encoding="utf-8") as f:
long_description = f.read()
setup(
name="calf",
version="0.0.0",
long_description=long_description,
long_description_content_type="text/markdown",
packages=find_namespace_packages(include=["calf.*"]),
entry_points={
"console_scripts": [
# DSL testing stuff
"calf-lex = calf.lexer:main",
"calf-parse = calf.parser:main",
"calf-read = calf.reader:main",
"calf-analyze = calf.analyzer:main",
"calf-compile = calf.compiler:main",
# Client/server stuff
"calf-client = calf.client:main",
"calf-server = calf.server:main",
"calf-worker = calf.worker:main",
]
},
install_requires=[
"pyrsistent~=0.17.0",
],
extra_requires={
"node": [
"flask~=1.1.0",
"pyyaml~=5.4.0",
"redis~=3.5.0",
],
},
)

View file

@ -0,0 +1 @@
#!/usr/bin/env python3

View file

@ -0,0 +1,3 @@
"""
The calf analyzer.
"""

View file

@ -0,0 +1,86 @@
"""
Some shared scaffolding for building terminal "REPL" drivers.
"""
import curses
from curses.textpad import Textbox, rectangle
def curse_repl(handle_buffer):
def handle(buff, count):
try:
return list(handle_buffer(buff, count)), None
except Exception as e:
return None, e
def _main(stdscr: curses.window):
maxy, maxx = 0, 0
examples = []
count = 1
while 1:
# Prompt
maxy, maxx = stdscr.getmaxyx()
stdscr.clear()
stdscr.addstr(0, 0, "Enter example: (hit Ctrl-G to execute, Ctrl-C to exit)", curses.A_BOLD)
editwin = curses.newwin(5, maxx - 4,
2, 2)
rectangle(stdscr,
1, 1,
1 + 5 + 1, maxx - 2)
# Printing is part of the prompt
cur = 8
def putstr(str, x=0, attr=0):
# ya rly. I know exactly what I'm doing here
nonlocal cur
# This is how we handle going off the bottom of the scren lol
if cur < maxy:
stdscr.addstr(cur, x, str, attr)
cur += (len(str.split("\n")) or 1)
for ex, buff, vals, err in reversed(examples):
putstr(f"Example {ex}:", attr=curses.A_BOLD)
for l in buff.split("\n"):
putstr(f" | {l}")
putstr("")
if err:
err = str(err)
err = err.split("\n")
putstr(" Error:")
for l in err:
putstr(f" {l}", attr=curses.COLOR_YELLOW)
elif vals:
putstr(" Values:")
for x, t in zip(range(1, 1<<64), vals):
putstr(f" {x:>3}) " + repr(t))
putstr("")
stdscr.refresh()
# Readf rom the user
box = Textbox(editwin)
try:
box.edit()
except KeyboardInterrupt:
break
buff = box.gather().strip()
if not buff:
continue
vals, err = handle(buff, count)
examples.append((count, buff, vals, err))
count += 1
stdscr.refresh()
curses.wrapper(_main)

View file

@ -0,0 +1,70 @@
"""
The elements of the Calf grammar. Used by the lexer.
"""
WHITESPACE = r"\n\r\s,"
DELIMS = r'%s\[\]\(\)\{\}:;#^"\'' % (WHITESPACE,)
SIMPLE_SYMBOL = r"([^{ds}\-\+\d][^{ds}]*)|([^{ds}\d]+)".format(ds=DELIMS)
SYMBOL_PATTERN = r"(((?P<namespace>{ss})/)?(?P<name>{ss}))".format(ss=SIMPLE_SYMBOL)
SIMPLE_INTEGER = r"[+-]?\d*"
FLOAT_PATTERN = r"(?P<body>({i})(\.(\d*))?)?([eE](?P<exponent>{i}))?".format(
i=SIMPLE_INTEGER
)
# HACK (arrdem 2021-03-13):
#
# The lexer is INCREMENTAL not TOTAL. It works by incrementally greedily
# building up strings that are PARTIAL matches. This means it has no support
# for the " closing anchor of a string, or the \n closing anchor of a comment.
# So we have to do this weird thing where the _required_ terminators are
# actually _optional_ here so that the parser works.
STRING_PATTERN = r'(""".*?(""")?)|("((\\"|[^"])*?)"?)'
COMMENT_PATTERN = r";(([^\n\r]*)(\n\r?)?)"
TOKENS = [
# Paren (noral) lists
(r"\(", "PAREN_LEFT",),
(r"\)", "PAREN_RIGHT",),
# Bracket lists
(r"\[", "BRACKET_LEFT",),
(r"\]", "BRACKET_RIGHT",),
# Brace lists (maps)
(r"\{", "BRACE_LEFT",),
(r"\}", "BRACE_RIGHT",),
(r"\^", "META",),
(r"'", "SINGLE_QUOTE",),
(STRING_PATTERN, "STRING",),
(r"#", "MACRO_DISPATCH",),
# Symbols
(SYMBOL_PATTERN, "SYMBOL",),
# Numbers
(SIMPLE_INTEGER, "INTEGER",),
(FLOAT_PATTERN, "FLOAT",),
# Keywords
#
# Note: this is a dirty f'n hack in that in order for keywords to work, ":"
# has to be defined to be a valid keyword.
(r":" + SYMBOL_PATTERN + "?", "KEYWORD",),
# Whitespace
#
# Note that the whitespace token will contain at most one newline
(r"(\n\r?|[,\t ]*)", "WHITESPACE",),
# Comment
(COMMENT_PATTERN, "COMMENT",),
# Strings
(r'"(?P<body>(?:[^\"]|\.)*)"', "STRING"),
]
MATCHING = {
"PAREN_LEFT": "PAREN_RIGHT",
"BRACKET_LEFT": "BRACKET_RIGHT",
"BRACE_LEFT": "BRACE_RIGHT",
}
WHITESPACE_TYPES = {"WHITESPACE", "COMMENT"}

View file

@ -0,0 +1,101 @@
"""
Various Reader class instances.
"""
class Position(object):
def __init__(self, offset, line, column):
self.offset = offset
self.line = line
self.column = column
def __repr__(self):
return "<Pos %r (%r:%r)>" % (self.offset, self.line, self.column)
def __str__(self):
return self.__repr__()
class PosReader(object):
"""A wrapper for anything that can be read from. Tracks offset, line and column information."""
def __init__(self, reader):
self.reader = reader
self.offset = 0
self.line = 1
self.column = 0
def read(self, n=1):
"""
Returns a pair (position, text) where position is the position of the first character in the
returned text. Text is a string of length up to or equal to `n` in length.
"""
p = self.position
if n == 1:
chr = self.reader.read(n)
if chr != "":
self.offset += 1
self.column += 1
if chr == "\n":
self.line += 1
self.column = 0
return (
p,
chr,
)
else:
return (
p,
"".join(self.read(n=1)[1] for i in range(n)),
)
@property
def position(self):
"""The position of the last character read."""
return Position(self.offset, self.line, self.column)
class PeekPosReader(PosReader):
"""A wrapper for anything that can be read from. Provides a way to peek the next character."""
def __init__(self, reader):
self.reader = reader if isinstance(reader, PosReader) else PosReader(reader)
self._peek = None
def read(self, n=1):
"""
Same as `PosReader.read`. Returns a pair (pos, text) where pos is the position of the first
read character and text is a string of length up to `n`. If a peeked character exists, it
is consumed by this operation.
"""
if self._peek and n == 1:
a = self._peek
self._peek = None
return a
else:
p, t = self._peek or (None, "")
if self._peek:
self._peek = None
p_, t_ = self.reader.read(n=(n if not t else n - len(t)))
p = p or p_
return (p, t + t_)
def peek(self):
"""Returns the (pos, text) pair which would be read next by read(n=1)."""
if self._peek is None:
self._peek = self.reader.read(n=1)
return self._peek
@property
def position(self):
"""The position of the last character read."""
return self.reader.position

View file

@ -0,0 +1,136 @@
"""
Calf lexer.
Provides machinery for lexing sources of text into sequences of tokens with textual information, as
well as buffer position information appropriate for either full AST parsing, lossless syntax tree
parsing, linting or other use.
"""
import io
import re
import sys
from calf.token import CalfToken
from calf.io.reader import PeekPosReader
from calf.grammar import TOKENS
from calf.util import *
class CalfLexer:
"""
Lexer object.
Wraps something you can read characters from, and presents a lazy sequence of Token objects.
Raises ValueError at any time due to either a conflict in the grammar being lexed, or incomplete
input. Exceptions from the backing reader object are not masked.
Rule order is used to decide conflicts. If multiple patterns would match an input, the "first"
in token list order wins.
"""
def __init__(self, stream, source=None, metadata=None, tokens=TOKENS):
"""FIXME"""
self._stream = (
PeekPosReader(stream) if not isinstance(stream, PeekPosReader) else stream
)
self.source = source
self.metadata = metadata or {}
self.tokens = tokens
def __next__(self):
"""
Tries to scan the next token off of the backing stream.
Starting with a list of all available tokens, an empty buffer and a single new character
peeked from the backing stream, reads more character so long as adding the next character
still leaves one or more possible matching "candidates" (token patterns).
When adding the next character from the stream would build an invalid token, a token of the
resulting single candidate type is generated.
At the end of input, if we have a single candidate remaining, a final token of that type is
generated. Otherwise we are in an incomplete input state either due to incomplete input or
a grammar conflict.
"""
buffer = ""
candidates = self.tokens
position, chr = self._stream.peek()
while chr:
if not candidates:
raise ValueError("Entered invalid state - no candidates!")
buff2 = buffer + chr
can2 = [t for t in candidates if re.fullmatch(t[0], buff2)]
# Try to include the last read character to support longest-wins grammars
if not can2 and len(candidates) >= 1:
pat, type = candidates[0]
groups = re.match(re.compile(pat), buffer).groupdict()
groups.update(self.metadata)
return CalfToken(type, buffer, self.source, position, groups)
else:
# Update the buffers
buffer = buff2
candidates = can2
# consume the 'current' character for side-effects
self._stream.read()
# set chr to be the next peeked character
_, chr = self._stream.peek()
if len(candidates) >= 1:
pat, type = candidates[0]
groups = re.match(re.compile(pat), buffer).groupdict()
groups.update(self.metadata)
return CalfToken(type, buffer, self.source, position, groups)
else:
raise ValueError(
"Encountered end of buffer with incomplete token %r" % (buffer,)
)
def __iter__(self):
"""
Scans tokens out of the character stream.
May raise ValueError if there is either an issue with the grammar or the input.
Will not mask any exceptions from the backing reader.
"""
# While the character stream isn't empty
while self._stream.peek()[1] != "":
yield next(self)
def lex_file(path, metadata=None):
"""
Returns the sequence of tokens resulting from lexing all text in the named file.
"""
with open(path, "r") as f:
return list(CalfLexer(f, path, {}))
def lex_buffer(buffer, source="<Buffer>", metadata=None):
"""
Returns the lazy sequence of tokens resulting from lexing all the text in a buffer.
"""
return CalfLexer(io.StringIO(buffer), source, metadata)
def main():
"""A CURSES application for using the lexer."""
from calf.cursedrepl import curse_repl
def handle_buffer(buff, count):
return list(lex_buffer(buff, source=f"<Example {count}>"))
curse_repl(handle_buffer)

View file

@ -0,0 +1,59 @@
"""
The Calf package infrastructure.
Calf's packaging infrastructure is very heavily inspired by Maven, and seeks first and foremost to
provide statically understandable, repeatable builds.
However the loading infrastructure is designed to simultaneously support from-source builds
appropriate to interactive development workflows and monorepos.
"""
from collections import namedtuple
class CalfLoaderConfig(namedtuple("CalfLoaderConfig", ["paths"])):
"""
"""
class CalfDelayedPackage(
namedtuple("CalfDelayedPackage", ["name", "version", "metadata", "path"])
):
"""
This structure represents the delay of loading a packaage.
Rather than eagerly analyze packages, it may be profitable to use lazy loading / lazy resolution
of symbols. It may also be possible to cache analyzing some packages.
"""
class CalfPackage(
namedtuple("CalfPackage", ["name", "version", "metadata", "modules"])
):
"""
This structure represents the result of forcing the load of a package, and is the product of
either loading a package directly, or a package becoming a direct dependency and being forced.
"""
def parse_package_requirement(config, env, requirement):
"""
:param config:
:param env:
:param requirement:
:returns:
"""
def analyze_package(config, env, package):
"""
:param config:
:param env:
:param module:
:returns:
Given a loader configuration and an environment to load into, analyzes the requested package,
returning an updated environment.
"""

View file

@ -0,0 +1,249 @@
"""
The Calf parser.
"""
from collections import namedtuple
from itertools import tee
import logging
import sys
from typing import NamedTuple, Callable
from calf.lexer import CalfLexer, lex_buffer, lex_file
from calf.grammar import MATCHING, WHITESPACE_TYPES
from calf.token import *
log = logging.getLogger(__name__)
def mk_list(contents, open=None, close=None):
return CalfListToken(
"LIST", contents, open.source, open.start_position, close.start_position
)
def mk_sqlist(contents, open=None, close=None):
return CalfListToken(
"SQLIST", contents, open.source, open.start_position, close.start_position
)
def pairwise(l: list) -> iter:
"s -> (s0,s1), (s2,s3), (s4, s5), ..."
return zip(l[::2], l[1::2])
def mk_dict(contents, open=None, close=None):
# FIXME (arrdem 2021-03-14):
# Raise a real SyntaxError of some sort.
assert len(contents) % 2 == 0, "Improper dict!"
return CalfDictToken(
"DICT",
list(pairwise(contents)),
open.source,
open.start_position,
close.start_position,
)
def mk_str(token):
buff = token.value
if buff.startswith('"""') and not buff.endswith('"""'):
raise ValueError('Unterminated tripple quote string')
elif buff.startswith('"') and not buff.endswith('"'):
raise ValueError('Unterminated quote string')
elif not buff.startswith('"') or buff == '"' or buff == '"""':
raise ValueError('Illegal string')
if buff.startswith('"""'):
buff = buff[3:-3]
else:
buff = buff[1:-1]
buff = buff.encode("utf-8").decode("unicode_escape") # Handle escape codes
return CalfStrToken(token, buff)
CTORS = {
"PAREN_LEFT": mk_list,
"BRACKET_LEFT": mk_sqlist,
"BRACE_LEFT": mk_dict,
"STRING": mk_str,
"INTEGER": CalfIntegerToken,
"FLOAT": CalfFloatToken,
"SYMBOL": CalfSymbolToken,
"KEYWORD": CalfKeywordToken,
}
class CalfParseError(Exception):
"""
Base class for representing errors encountered parsing.
"""
def __init__(self, message: str, token: CalfToken):
super(Exception, self).__init__(message)
self.token = token
def __str__(self):
return f"Parse error at {self.token.loc()}: " + super().__str__()
class CalfUnexpectedCloseParseError(CalfParseError):
"""
Represents encountering an unexpected close token.
"""
def __init__(self, token, matching_open=None):
msg = f"encountered unexpected closing {token!r}"
if matching_open:
msg += f" which appears to match {matching_open!r}"
super(CalfParseError, self).__init__(msg, token)
self.token = token
self.matching_open = matching_open
class CalfMissingCloseParseError(CalfParseError):
"""
Represents a failure to encounter an expected close token.
"""
def __init__(self, expected_close_token, open_token):
super(CalfMissingCloseParseError, self).__init__(
f"expected {expected_close_token} starting from {open_token}, got end of file.",
open_token
)
self.expected_close_token = expected_close_token
def parse_stream(stream,
discard_whitespace: bool = True,
discard_comments: bool = True,
stack: list = None):
"""Parses a token stream, producing a lazy sequence of all read top level forms.
If `discard_whitespace` is truthy, then no WHITESPACE tokens will be emitted
into the resulting parse tree. Otherwise, WHITESPACE tokens will be
included. Whether WHITESPACE tokens are included or not, the tokens of the
tree will reflect original source locations.
"""
stack = stack or []
def recur(_stack = None):
yield from parse_stream(stream,
discard_whitespace,
discard_comments,
_stack or stack)
for token in stream:
# Whitespace discarding
if token.type == "WHITESPACE" and discard_whitespace:
continue
elif token.type == "COMMENT" and discard_comments:
continue
# Built in reader macros
elif token.type == "META":
try:
meta_t = next(recur())
except StopIteration:
raise CalfParseError("^ not followed by meta value", token)
try:
value_t = next(recur())
except StopIteration:
raise CalfParseError("^ not followed by value", token)
yield CalfMetaToken(token, meta_t, value_t)
elif token.type == "MACRO_DISPATCH":
try:
dispatch_t = next(recur())
except StopIteration:
raise CalfParseError("# not followed by dispatch value", token)
try:
value_t = next(recur())
except StopIteration:
raise CalfParseError("^ not followed by value", token)
yield CalfDispatchToken(token, dispatch_t, value_t)
elif token.type == "SINGLE_QUOTE":
try:
quoted_t = next(recur())
except StopIteration:
raise CalfParseError("' not followed by quoted form", token)
yield CalfQuoteToken(token, quoted_t)
# Compounds
elif token.type in MATCHING.keys():
balancing = MATCHING[token.type]
elements = list(recur(stack + [(balancing, token)]))
# Elements MUST have at least the close token in it
if not elements:
raise CalfMissingCloseParseError(balancing, token)
elements, close = elements[:-1], elements[-1]
if close.type != MATCHING[token.type]:
raise CalfMissingCloseParseError(balancing, token)
yield CTORS[token.type](elements, token, close)
elif token.type in MATCHING.values():
# Case of matching the immediate open
if stack and token.type == stack[-1][0]:
yield token
break
# Case of maybe matching something else, but definitely being wrong
else:
matching = next(reversed([t[1] for t in stack if t[0] == token.type]), None)
raise CalfUnexpectedCloseParseError(token, matching)
# Atoms
elif token.type in CTORS:
yield CTORS[token.type](token)
else:
yield token
def parse_buffer(buffer,
discard_whitespace=True,
discard_comments=True):
"""
Parses a buffer, producing a lazy sequence of all parsed level forms.
Propagates all errors.
"""
yield from parse_stream(lex_buffer(buffer),
discard_whitespace,
discard_comments)
def parse_file(file):
"""
Parses a file, producing a lazy sequence of all parsed level forms.
"""
yield from parse_stream(lex_file(file))
def main():
"""A CURSES application for using the parser."""
from calf.cursedrepl import curse_repl
def handle_buffer(buff, count):
return list(parse_stream(lex_buffer(buff, source=f"<Example {count}>")))
curse_repl(handle_buffer)

View file

@ -0,0 +1,156 @@
"""The Calf reader
Unlike the lexer and parser which are mostly information preserving, the reader
is designed to be a somewhat pluggable structure for implementing transforms and
discarding information.
"""
from typing import *
from calf.lexer import lex_buffer, lex_file
from calf.parser import parse_stream
from calf.token import *
from calf.types import *
class CalfReader(object):
def handle_keyword(self, t: CalfToken) -> Any:
"""Convert a token to an Object value for a symbol.
Implementations could convert kws to strings, to a dataclass of some
sort, use interning, or do none of the above.
"""
return Keyword.of(t.more.get("name"), t.more.get("namespace"))
def handle_symbol(self, t: CalfToken) -> Any:
"""Convert a token to an Object value for a symbol.
Implementations could convert syms to strings, to a dataclass of some
sort, use interning, or do none of the above.
"""
return Symbol.of(t.more.get("name"), t.more.get("namespace"))
def handle_dispatch(self, t: CalfDispatchToken) -> Any:
"""Handle a #foo <> dispatch token.
Implementations may choose how dispatch is mapped to values, for
instance by imposing a static mapping or by calling out to runtime state
or other data sources to implement this hook. It's intended to be an
open dispatch mechanism, unlike the others which should have relatively
defined behavior.
The default implementation simply preserves the dispatch token.
"""
return t
def handle_meta(self, t: CalfMetaToken) -> Any:
"""Handle a ^<> <> so called 'meta' token.
Implementations may choose how to process metadata, discarding it or
consuming it somehow.
The default implementation simply discards the tag value.
"""
return self.read1(t.value)
def make_quote(self):
"""Factory. Returns the quote or equivalent symbol. May use `self.make_symbol()` to do so."""
return Symbol.of("quote")
def handle_quote(self, t: CalfQuoteToken) -> Any:
"""Handle a 'foo quote form."""
return Vector.of([self.make_quote(), self.read1(t.value)])
def read1(self, t: CalfToken) -> Any:
# Note: 'square' and 'round' lists are treated the same. This should be
# a hook. Should {} be a "list" too until it gets reader hooked into
# being a mapping or a set?
if isinstance(t, CalfListToken):
return Vector.of(self.read(t.value))
elif isinstance(t, CalfDictToken):
return Map.of([(self.read1(k), self.read1(v))
for k, v in t.items()])
# Magical pairwise stuff
elif isinstance(t, CalfQuoteToken):
return self.handle_quote(t)
elif isinstance(t, CalfMetaToken):
return self.handle_meta(t)
elif isinstance(t, CalfDispatchToken):
return self.handle_dispatch(t)
# Stuff with real factories
elif isinstance(t, CalfKeywordToken):
return self.handle_keyword(t)
elif isinstance(t, CalfSymbolToken):
return self.handle_symbol(t)
# Terminals
elif isinstance(t, CalfStrToken):
return str(t)
elif isinstance(t, CalfIntegerToken):
return int(t)
elif isinstance(t, CalfFloatToken):
return float(t)
else:
raise ValueError(f"Unsupported token type {t!r} ({type(t)})")
def read(self, stream):
"""Given a sequence of tokens, read 'em."""
for t in stream:
yield self.read1(t)
def read_stream(stream,
reader: CalfReader = None):
"""Read from a stream of parsed tokens.
"""
reader = reader or CalfReader()
yield from reader.read(stream)
def read_buffer(buffer):
"""Read from a buffer, producing a lazy sequence of all top level forms.
"""
yield from read_stream(parse_stream(lex_buffer(buffer)))
def read_file(file):
"""Read from a file, producing a lazy sequence of all top level forms.
"""
yield from read_stream(parse_stream(lex_file(file)))
def main():
"""A CURSES application for using the reader."""
from calf.cursedrepl import curse_repl
def handle_buffer(buff, count):
return list(read_stream(parse_stream(lex_buffer(buff, source=f"<Example {count}>"))))
curse_repl(handle_buffer)

View file

@ -0,0 +1,239 @@
"""
Tokens.
The philosophy here is that to the greatest extent possible we want to preserve lexical (source)
information about indentation, position and soforth. That we have to do so well mutably is just a
pain in the ass and kinda unavoidable.
Consequently, this file defines classes which wrap core Python primitives, providing all the usual
bits in terms of acting like values, while preserving fairly extensive source information.
"""
class CalfToken:
"""
Token object.
The result of reading a token from the source character feed.
Encodes the source, and the position in the source from which it was read.
"""
def __init__(self, type, value, source, start_position, more):
self.type = type
self.value = value
self.source = source
self.start_position = start_position
self.more = more if more is not None else {}
def __repr__(self):
return "<%s:%s %r %s %r>" % (
type(self).__name__,
self.type,
self.value,
self.loc(),
self.more,
)
def loc(self):
return "%r@%r:%r" % (
self.source,
self.line,
self.column,
)
def __str__(self):
return self.value
@property
def offset(self):
if self.start_position is not None:
return self.start_position.offset
@property
def line(self):
if self.start_position is not None:
return self.start_position.line
@property
def column(self):
if self.start_position is not None:
return self.start_position.column
class CalfBlockToken(CalfToken):
"""
(Block) Token object.
The base result of parsing a token with a start and an end position.
"""
def __init__(self, type, value, source, start_position, end_position, more):
CalfToken.__init__(self, type, value, source, start_position, more)
self.end_position = end_position
class CalfListToken(CalfBlockToken, list):
"""
(list) Token object.
The final result of reading a parens list through the Calf lexer stack.
"""
def __init__(self, type, value, source, start_position, end_position):
CalfBlockToken.__init__(
self, type, value, source, start_position, end_position, None
)
list.__init__(self, value)
class CalfDictToken(CalfBlockToken, dict):
"""
(dict) Token object.
The final(ish) result of reading a braces list through the Calf lexer stack.
"""
def __init__(self, type, value, source, start_position, end_position):
CalfBlockToken.__init__(
self, type, value, source, start_position, end_position, None
)
dict.__init__(self, value)
class CalfIntegerToken(CalfToken, int):
"""
(int) Token object.
The final(ish) result of reading an integer.
"""
def __new__(cls, value):
return int.__new__(cls, value.value)
def __init__(self, value):
CalfToken.__init__(
self,
value.type,
value.value,
value.source,
value.start_position,
value.more,
)
class CalfFloatToken(CalfToken, float):
"""
(int) Token object.
The final(ish) result of reading a float.
"""
def __new__(cls, value):
return float.__new__(cls, value.value)
def __init__(self, value):
CalfToken.__init__(
self,
value.type,
value.value,
value.source,
value.start_position,
value.more,
)
class CalfStrToken(CalfToken, str):
"""
(str) Token object.
The final(ish) result of reading a string.
"""
def __new__(cls, token, buff):
return str.__new__(cls, buff)
def __init__(self, token, buff):
CalfToken.__init__(
self,
token.type,
buff,
token.source,
token.start_position,
token.more,
)
str.__init__(self)
class CalfSymbolToken(CalfToken):
"""A symbol."""
def __init__(self, token):
CalfToken.__init__(
self,
token.type,
token.value,
token.source,
token.start_position,
token.more,
)
class CalfKeywordToken(CalfToken):
"""A keyword."""
def __init__(self, token):
CalfToken.__init__(
self,
token.type,
token.value,
token.source,
token.start_position,
token.more,
)
class CalfMetaToken(CalfToken):
"""A ^ meta token."""
def __init__(self, token, meta, value):
CalfToken.__init__(
self,
token.type,
value,
token.source,
token.start_position,
token.more,
)
self.meta = meta
class CalfDispatchToken(CalfToken):
"""A # macro dispatch token."""
def __init__(self, token, tag, value):
CalfToken.__init__(
self,
token.type,
value,
token.source,
token.start_position,
token.more,
)
self.tag = tag
class CalfQuoteToken(CalfToken):
"""A ' quotation."""
def __init__(self, token, quoted):
CalfToken.__init__(
self,
token.type,
quoted,
token.source,
token.start_position,
token.more,
)

View file

@ -0,0 +1,44 @@
"""Core types for Calf.
I don't love baking these in, but there's one place to start and there'll be a
considerable amount of bootstrappy nonsense to get through. So just start with
good ol' fashioned types and type aliases.
"""
from typing import *
import pyrsistent as p
class Symbol(NamedTuple):
name: str
namespace: Optional[str]
@classmethod
def of(cls, name: str, namespace: str = None):
return cls(name, namespace)
class Keyword(NamedTuple):
name: str
namespace: Optional[str]
@classmethod
def of(cls, name: str, namespace: str = None):
return cls(name, namespace)
# FIXME (arrdem 2021-03-20):
#
# Don't just go out to Pyrsistent for the datatypes. Do something somewhat
# smarter, especially given the games Pyrsistent is playing around loading
# ctype implementations for performance. God only knows about correctness tho.
Map = p.PMap
Map.of = staticmethod(p.pmap)
Vector = p.PVector
Vector.of = staticmethod(p.pvector)
Set = p.PSet
Set.of = staticmethod(p.pset)

View file

@ -0,0 +1,23 @@
"""
Bits and bats.
Mainly bats.
"""
import re
def memoize(f):
memo = {}
def helper(x):
if x not in memo:
memo[x] = f(x)
return memo[x]
return helper
@memoize
def re_mem(regex):
return re.compile(regex)

20
projects/calf/tests/BUILD Normal file
View file

@ -0,0 +1,20 @@
py_library(
name = "conftest",
srcs = [
"conftest.py"
],
imports = [
"."
],
)
py_pytest(
name = "test",
srcs = glob(["*.py"]),
deps = [
"//projects/calf:lib",
":conftest",
py_requirement("pytest-cov"),
],
args = ["--cov-report", "term", "--cov=calf"],
)

View file

@ -0,0 +1,7 @@
"""
Fixtures for testing Calf.
"""
import pytest
parametrize = pytest.mark.parametrize

View file

@ -0,0 +1,30 @@
"""
Tests covering the Calf grammar.
"""
import re
from calf import grammar as cg
from conftest import parametrize
@parametrize('ex', [
# Proper strings
'""',
'"foo bar"',
'"foo\n bar\n\r qux"',
'"foo\\"bar"',
'""""""',
'"""foo bar baz"""',
'"""foo "" "" "" bar baz"""',
# Unterminated string cases
'"',
'"f',
'"foo bar',
'"foo\\" bar',
'"""foo bar baz',
])
def test_match_string(ex):
assert re.fullmatch(cg.STRING_PATTERN, ex)

View file

@ -0,0 +1,89 @@
"""
Tests of calf.lexer
Tests both basic functionality, some examples and makes sure that arbitrary token sequences round
trip through the lexer.
"""
import calf.lexer as cl
from conftest import parametrize
import pytest
def lex_single_token(buffer):
"""Lexes a single token from the buffer."""
return next(iter(cl.lex_buffer(buffer)))
@parametrize(
"text,token_type",
[
("(", "PAREN_LEFT",),
(")", "PAREN_RIGHT",),
("[", "BRACKET_LEFT",),
("]", "BRACKET_RIGHT",),
("{", "BRACE_LEFT",),
("}", "BRACE_RIGHT",),
("^", "META",),
("#", "MACRO_DISPATCH",),
("'", "SINGLE_QUOTE"),
("foo", "SYMBOL",),
("foo/bar", "SYMBOL"),
(":foo", "KEYWORD",),
(":foo/bar", "KEYWORD",),
(" ,,\t ,, \t", "WHITESPACE",),
("\n\r", "WHITESPACE"),
("\n", "WHITESPACE"),
(" , ", "WHITESPACE",),
("; this is a sample comment\n", "COMMENT"),
('"foo"', "STRING"),
('"foo bar baz"', "STRING"),
],
)
def test_lex_examples(text, token_type):
t = lex_single_token(text)
assert t.value == text
assert t.type == token_type
@parametrize(
"text,token_types",
[
("foo^bar", ["SYMBOL", "META", "SYMBOL"]),
("foo bar", ["SYMBOL", "WHITESPACE", "SYMBOL"]),
("foo-bar", ["SYMBOL"]),
("foo\nbar", ["SYMBOL", "WHITESPACE", "SYMBOL"]),
(
"{[^#()]}",
[
"BRACE_LEFT",
"BRACKET_LEFT",
"META",
"MACRO_DISPATCH",
"PAREN_LEFT",
"PAREN_RIGHT",
"BRACKET_RIGHT",
"BRACE_RIGHT",
],
),
("+", ["SYMBOL"]),
("-", ["SYMBOL"]),
("1", ["INTEGER"]),
("-1", ["INTEGER"]),
("-1.0", ["FLOAT"]),
("-1e3", ["FLOAT"]),
("+1.3e", ["FLOAT"]),
("f", ["SYMBOL"]),
("f1", ["SYMBOL"]),
("f1g2", ["SYMBOL"]),
("foo13-bar", ["SYMBOL"]),
("foo+13-12bar", ["SYMBOL"]),
("+-+-+-+-+", ["SYMBOL"]),
],
)
def test_lex_compound_examples(text, token_types):
t = cl.lex_buffer(text)
result_types = [token.type for token in t]
assert result_types == token_types

View file

@ -0,0 +1,219 @@
"""
Tests of calf.parser
"""
import calf.parser as cp
from conftest import parametrize
import pytest
@parametrize("text", [
'"',
'"foo bar',
'"""foo bar',
'"""foo bar"',
])
def test_bad_strings_raise(text):
"""Tests asserting we won't let obviously bad strings fly."""
# FIXME (arrdem 2021-03-13):
# Can we provide this behavior in the lexer rather than in the parser?
with pytest.raises(ValueError):
next(cp.parse_buffer(text))
@parametrize("text", [
"[1.0",
"(1.0",
"{1.0",
])
def test_unterminated_raises(text):
"""Tests asserting that we don't let unterminated collections parse."""
with pytest.raises(cp.CalfMissingCloseParseError):
next(cp.parse_buffer(text))
@parametrize("text", [
"[{]",
"[(]",
"({)",
"([)",
"{(}",
"{[}",
])
def test_unbalanced_raises(text):
"""Tests asserting that we don't let missmatched collections parse."""
with pytest.raises(cp.CalfUnexpectedCloseParseError):
next(cp.parse_buffer(text))
@parametrize("buff, value", [
('"foo"', "foo"),
('"foo\tbar"', "foo\tbar"),
('"foo\n\rbar"', "foo\n\rbar"),
('"foo\\"bar\\""', "foo\"bar\""),
('"""foo"""', 'foo'),
('"""foo"bar"baz"""', 'foo"bar"baz'),
])
def test_strings_round_trip(buff, value):
assert next(cp.parse_buffer(buff)) == value
@parametrize('text, element_types', [
# Integers
("(1)", ["INTEGER"]),
("( 1 )", ["INTEGER"]),
("(,1,)", ["INTEGER"]),
("(1\n)", ["INTEGER"]),
("(\n1\n)", ["INTEGER"]),
("(1, 2, 3, 4)", ["INTEGER", "INTEGER", "INTEGER", "INTEGER"]),
# Floats
("(1.0)", ["FLOAT"]),
("(1.0e0)", ["FLOAT"]),
("(1e0)", ["FLOAT"]),
("(1e0)", ["FLOAT"]),
# Symbols
("(foo)", ["SYMBOL"]),
("(+)", ["SYMBOL"]),
("(-)", ["SYMBOL"]),
("(*)", ["SYMBOL"]),
("(foo-bar)", ["SYMBOL"]),
("(+foo-bar+)", ["SYMBOL"]),
("(+foo-bar+)", ["SYMBOL"]),
("( foo bar )", ["SYMBOL", "SYMBOL"]),
# Keywords
("(:foo)", ["KEYWORD"]),
("( :foo )", ["KEYWORD"]),
("(\n:foo\n)", ["KEYWORD"]),
("(,:foo,)", ["KEYWORD"]),
("(:foo :bar)", ["KEYWORD", "KEYWORD"]),
("(:foo :bar 1)", ["KEYWORD", "KEYWORD", "INTEGER"]),
# Strings
('("foo", "bar", "baz")', ["STRING", "STRING", "STRING"]),
# Lists
('([] [] ())', ["SQLIST", "SQLIST", "LIST"]),
])
def test_parse_list(text, element_types):
"""Test we can parse various lists of contents."""
l_t = next(cp.parse_buffer(text, discard_whitespace=True))
assert l_t.type == "LIST"
assert [t.type for t in l_t] == element_types
@parametrize('text, element_types', [
# Integers
("[1]", ["INTEGER"]),
("[ 1 ]", ["INTEGER"]),
("[,1,]", ["INTEGER"]),
("[1\n]", ["INTEGER"]),
("[\n1\n]", ["INTEGER"]),
("[1, 2, 3, 4]", ["INTEGER", "INTEGER", "INTEGER", "INTEGER"]),
# Floats
("[1.0]", ["FLOAT"]),
("[1.0e0]", ["FLOAT"]),
("[1e0]", ["FLOAT"]),
("[1e0]", ["FLOAT"]),
# Symbols
("[foo]", ["SYMBOL"]),
("[+]", ["SYMBOL"]),
("[-]", ["SYMBOL"]),
("[*]", ["SYMBOL"]),
("[foo-bar]", ["SYMBOL"]),
("[+foo-bar+]", ["SYMBOL"]),
("[+foo-bar+]", ["SYMBOL"]),
("[ foo bar ]", ["SYMBOL", "SYMBOL"]),
# Keywords
("[:foo]", ["KEYWORD"]),
("[ :foo ]", ["KEYWORD"]),
("[\n:foo\n]", ["KEYWORD"]),
("[,:foo,]", ["KEYWORD"]),
("[:foo :bar]", ["KEYWORD", "KEYWORD"]),
("[:foo :bar 1]", ["KEYWORD", "KEYWORD", "INTEGER"]),
# Strings
('["foo", "bar", "baz"]', ["STRING", "STRING", "STRING"]),
# Lists
('[[] [] ()]', ["SQLIST", "SQLIST", "LIST"]),
])
def test_parse_sqlist(text, element_types):
"""Test we can parse various 'square' lists of contents."""
l_t = next(cp.parse_buffer(text, discard_whitespace=True))
assert l_t.type == "SQLIST"
assert [t.type for t in l_t] == element_types
@parametrize('text, element_pairs', [
("{}",
[]),
("{:foo 1}",
[["KEYWORD", "INTEGER"]]),
("{:foo 1, :bar 2}",
[["KEYWORD", "INTEGER"],
["KEYWORD", "INTEGER"]]),
("{foo 1, bar 2}",
[["SYMBOL", "INTEGER"],
["SYMBOL", "INTEGER"]]),
("{foo 1, bar -2}",
[["SYMBOL", "INTEGER"],
["SYMBOL", "INTEGER"]]),
("{foo 1, bar -2e0}",
[["SYMBOL", "INTEGER"],
["SYMBOL", "FLOAT"]]),
("{foo ()}",
[["SYMBOL", "LIST"]]),
("{foo []}",
[["SYMBOL", "SQLIST"]]),
("{foo {}}",
[["SYMBOL", "DICT"]]),
('{"foo" {}}',
[["STRING", "DICT"]])
])
def test_parse_dict(text, element_pairs):
"""Test we can parse various mappings."""
d_t = next(cp.parse_buffer(text, discard_whitespace=True))
assert d_t.type == "DICT"
assert [[t.type for t in pair] for pair in d_t.value] == element_pairs
@parametrize("text", [
"{1}",
"{1, 2, 3}",
"{:foo}",
"{:foo :bar :baz}"
])
def test_parse_bad_dict(text):
"""Assert that dicts with missmatched pairs don't parse."""
with pytest.raises(Exception):
next(cp.parse_buffer(text))
@parametrize("text", [
"()",
"(1 1.1 1e2 -2 foo :foo foo/bar :foo/bar [{},])",
"{:foo bar, :baz [:qux]}",
"'foo",
"'[foo bar :baz 'qux, {}]",
"#foo []",
"^{} bar",
])
def test_examples(text):
"""Shotgun examples showing we can parse some stuff."""
assert list(cp.parse_buffer(text))

View file

@ -0,0 +1,22 @@
"""
"""
from conftest import parametrize
from calf.reader import read_buffer
@parametrize('text', [
"()",
"[]",
"[[[[[[[[[]]]]]]]]]",
"{1 {2 {}}}",
'"foo"',
"foo",
"'foo",
"^foo bar",
"^:foo bar",
"{\"foo\" '([:bar ^:foo 'baz 3.14159e0])}",
"[:foo bar 'baz lo/l, 1, 1.2. 1e-5 -1e2]",
])
def test_read(text):
assert list(read_buffer(text))

View file

@ -0,0 +1,17 @@
"""
Tests covering the Calf types.
"""
from calf import types as t
def test_maps_check():
assert isinstance(t.Map.of([(1, 2)]), t.Map)
def test_vectors_check():
assert isinstance(t.Vector.of([(1, 2)]), t.Vector)
def test_sets_check():
assert isinstance(t.Set.of([(1, 2)]), t.Set)

View file

@ -1,3 +1,6 @@
"""A shim for executing pytest."""
import os
import sys import sys
import pytest import pytest
@ -9,4 +12,7 @@ if __name__ == "__main__":
cmdline = ["--ignore=external"] + sys.argv[1:] cmdline = ["--ignore=external"] + sys.argv[1:]
print(cmdline, file=sys.stderr) print(cmdline, file=sys.stderr)
for e in sys.path:
print(f" - {os.path.realpath(e)}", file=sys.stderr)
sys.exit(pytest.main(cmdline)) sys.exit(pytest.main(cmdline))

View file

@ -9,6 +9,7 @@ certifi==2020.12.5
chardet==4.0.0 chardet==4.0.0
click==7.1.2 click==7.1.2
commonmark==0.9.1 commonmark==0.9.1
coverage==5.5
docutils==0.17 docutils==0.17
idna==2.10 idna==2.10
imagesize==1.2.0 imagesize==1.2.0
@ -37,6 +38,7 @@ Pygments==2.8.1
pyparsing==2.4.7 pyparsing==2.4.7
pyrsistent==0.17.3 pyrsistent==0.17.3
pytest==6.2.3 pytest==6.2.3
pytest-cov==2.11.1
pytest-pudb==0.7.0 pytest-pudb==0.7.0
pytz==2021.1 pytz==2021.1
PyYAML==5.4.1 PyYAML==5.4.1

View file

@ -26,6 +26,7 @@ LICENSES_BY_LOWERNAME = {
"apache 2.0": "License :: OSI Approved :: Apache Software License", "apache 2.0": "License :: OSI Approved :: Apache Software License",
"apache": "License :: OSI Approved :: Apache Software License", "apache": "License :: OSI Approved :: Apache Software License",
"bsd 3 clause": "License :: OSI Approved :: BSD License", "bsd 3 clause": "License :: OSI Approved :: BSD License",
"bsd 3-clause": "License :: OSI Approved :: BSD License",
"bsd": "License :: OSI Approved :: BSD License", "bsd": "License :: OSI Approved :: BSD License",
"gplv3": "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "gplv3": "License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
"http://www.apache.org/licenses/license-2.0": "License :: OSI Approved :: Apache Software License", "http://www.apache.org/licenses/license-2.0": "License :: OSI Approved :: Apache Software License",