Import datalog-shell
This commit is contained in:
parent
318f7caa6a
commit
633060910c
6 changed files with 522 additions and 0 deletions
9
projects/datalog-shell/BUILD
Normal file
9
projects/datalog-shell/BUILD
Normal file
|
@ -0,0 +1,9 @@
|
|||
py_binary(
|
||||
name = "datalog-shell",
|
||||
main = "__main__.py",
|
||||
deps = [
|
||||
"//projects/datalog",
|
||||
py_requirement("prompt_toolkit"),
|
||||
py_requirement("yaspin"),
|
||||
]
|
||||
)
|
18
projects/datalog-shell/Makefile
Normal file
18
projects/datalog-shell/Makefile
Normal file
|
@ -0,0 +1,18 @@
|
|||
.PHONY: deploy test
|
||||
|
||||
deploy: .dev
|
||||
source .dev/bin/activate; pip install twine; rm -r dist; python setup.py sdist; twine upload dist/*;
|
||||
|
||||
.dev:
|
||||
virtualenv --python=`which python3` .dev
|
||||
source .dev/bin/activate; pip install pytest; python setup.py develop
|
||||
|
||||
node_modules/canopy:
|
||||
npm install canopy
|
||||
|
||||
src/datalog/parser.py: node_modules/canopy src/datalog.peg
|
||||
node_modules/canopy/bin/canopy --lang=python src/datalog.peg
|
||||
mv src/datalog.py src/datalog/parser.py
|
||||
|
||||
test: .dev $(wildcard src/**/*) $(wildcard test/**/*)
|
||||
source .dev/bin/activate; PYTHONPATH=".:src/" pytest -vv
|
179
projects/datalog-shell/README.md
Normal file
179
projects/datalog-shell/README.md
Normal file
|
@ -0,0 +1,179 @@
|
|||
# Datalog.Shell
|
||||
|
||||
A shell for my Datalog engine.
|
||||
|
||||
## What is Datalog?
|
||||
|
||||
[Datalog](https://en.wikipedia.org/wiki/Datalog) is a fully
|
||||
declarative language for expressing relational data and queries,
|
||||
typically written using a syntactic subset of Prolog. Its most
|
||||
interesting feature compared to other relational languages such as SQL
|
||||
is that it features production rules.
|
||||
|
||||
Briefly, a datalog database consists of rules and tuples. Tuples are
|
||||
written `a(b, "c", 126, ...).`, require no declaration eg. of table,
|
||||
may be of arbitrary even varying length. The elements of this tuple
|
||||
are strings which may be written as bare words or quoted.
|
||||
|
||||
In the interpreter (or a file), we could define a small graph as such -
|
||||
|
||||
```
|
||||
$ datalog
|
||||
>>> edge(a, b).
|
||||
⇒ edge('a', 'b')
|
||||
>>> edge(b, c).
|
||||
⇒ edge('b', 'c')
|
||||
>>> edge(c, d).
|
||||
⇒ edge('c', 'd')
|
||||
```
|
||||
|
||||
But how can we query this? We can issue queries by entering a tuple
|
||||
terminated with `?` instead of `.`.
|
||||
|
||||
For instance we could query if some tuples exist in the database -
|
||||
|
||||
```
|
||||
>>> edge(a, b)?
|
||||
⇒ edge('a', 'b')
|
||||
>>> edge(d, f)?
|
||||
⇒ Ø
|
||||
>>>
|
||||
```
|
||||
|
||||
We did define `edge(a, b).` so our query returns that tuple. However
|
||||
the tuple `edge(d, f).` was not defined, so our query produces no
|
||||
results. Rather than printing nothing, the `Ø` symbol which denotes
|
||||
the empty set is printed for clarity.
|
||||
|
||||
This is correct, but uninteresting. How can we find say all the edges
|
||||
from `a`? We don't have a construct like wildcards with which to match
|
||||
anything - yet.
|
||||
|
||||
Enter logic variables. Logic variables are capitalized words, `X`,
|
||||
`Foo` and the like, which are interpreted as wildcards by the query
|
||||
engine. Capitalized words are always understood as logic variables.
|
||||
|
||||
```
|
||||
>>> edge(a, X)?
|
||||
⇒ edge('a', 'b')
|
||||
```
|
||||
|
||||
However unlike wildcards which simply match anything, logic variables
|
||||
are unified within a query. Were we to write `edge(X, X)?` we would be
|
||||
asking for the set of tuples such that both elements of the `edge`
|
||||
tuple equate.
|
||||
|
||||
```
|
||||
>>> edge(X, X)?
|
||||
⇒ Ø
|
||||
```
|
||||
|
||||
Of which we have none.
|
||||
|
||||
But what if we wanted to find paths between edges? Say to check if a
|
||||
path existed from `a` to `d`. We'd need to find a way to unify many
|
||||
logic variables together - and so far we've only seen queries of a
|
||||
single tuple.
|
||||
|
||||
Enter rules. We can define productions by which the Datalog engine can
|
||||
produce new tuples. Rules are written as a tuple "pattern" which may
|
||||
contain constants or logic variables, followed by a sequence of
|
||||
"clauses" separated by the `:-` assignment operator.
|
||||
|
||||
Rules are perhaps best understood as subqueries. A rule defines an
|
||||
indefinite set of tuples such that over that set, the query clauses
|
||||
are simultaneously satisfied. This is how we achieve complex queries.
|
||||
|
||||
There is no alternation - or - operator within a rule's body. However,
|
||||
rules can share the same tuple "pattern".
|
||||
|
||||
So if we wanted to say find paths between edges in our database, we
|
||||
could do so using two rules. One which defines a "simple" path, and
|
||||
one which defines a path from `X` to `Y` recursively by querying for
|
||||
an edge from `X` to an unconstrained `Z`, and then unifying that with
|
||||
`path(Z, Y)`.
|
||||
|
||||
```
|
||||
>>> path(X, Y) :- edge(X, Y).
|
||||
⇒ path('X', 'Y') :- edge('X', 'Y').
|
||||
>>> path(X, Y) :- edge(X, Z), path(Z, Y).
|
||||
⇒ path('X', 'Y') :- edge('X', 'Z'), path('Z', 'Y').
|
||||
>>> path(a, X)?
|
||||
⇒ path('a', 'b')
|
||||
⇒ path('a', 'c')
|
||||
⇒ path('a', 'd')
|
||||
```
|
||||
|
||||
We could also ask for all paths -
|
||||
|
||||
```
|
||||
>>> path(X, Y)?
|
||||
⇒ path('b', 'c')
|
||||
⇒ path('a', 'b')
|
||||
⇒ path('c', 'd')
|
||||
⇒ path('b', 'd')
|
||||
⇒ path('a', 'c')
|
||||
⇒ path('a', 'd')
|
||||
```
|
||||
|
||||
Datalog also supports negation. Within a rule, a tuple prefixed with
|
||||
`~` becomes a negative statement. This allows us to express "does not
|
||||
exist" relations, or antjoins. Note that this is only possible by
|
||||
making the [closed world assumption](https://en.wikipedia.org/wiki/Closed-world_assumption).
|
||||
|
||||
Datalog also supports binary equality as a special relation. `=(X,Y)?`
|
||||
is a nonsense query alone because the space of `X` and `Y` are
|
||||
undefined. However within a rule body, equality (and negated
|
||||
equality statements!) can be quite useful.
|
||||
|
||||
For convenience, the Datalog interpreter supports "retracting"
|
||||
(deletion) of tuples and rules. `edge(a, b)!` would retract that
|
||||
constant tuple, but we cannot retract `path(a, b)!` as that tuple is
|
||||
generated by a rule. We can however retract the rule - `edge(X, Y)!`
|
||||
which would remove both edge production rules from the database.
|
||||
|
||||
The Datalog interpreter also supports reading tuples (and rules) from
|
||||
one or more files, each specified by the `--db <filename>` command
|
||||
line argument.
|
||||
|
||||
## Usage
|
||||
|
||||
`pip install --user arrdem.datalog.shell`
|
||||
|
||||
This will install the `datalog` interpreter into your user-local
|
||||
python `bin` directory, and pull down the core `arrdem.datalog` engine
|
||||
as well.
|
||||
|
||||
## Status
|
||||
|
||||
This is a complete to my knowledge implementation of a traditional datalog.
|
||||
|
||||
Support is included for binary `=` as builtin relation, and for negated terms in
|
||||
rules (prefixed with `~`)
|
||||
|
||||
Rules, and the recursive evaluation of rules is supported with some guards to
|
||||
prevent infinite recursion.
|
||||
|
||||
The interactive interpreter supports definitions (terms ending in `.`),
|
||||
retractions (terms ending in `!`) and queries (terms ending in `?`), see the
|
||||
interpreter's `help` response for more details.
|
||||
|
||||
### Limitations
|
||||
|
||||
Recursion may have some completeness bugs. I have not yet encountered any, but I
|
||||
also don't have a strong proof of correctness for the recursive evaluation of
|
||||
rules yet.
|
||||
|
||||
The current implementation of negated clauses CANNOT propagate positive
|
||||
information. This means that negated clauses can only be used in conjunction
|
||||
with positive clauses. It's not clear if this is an essential limitation.
|
||||
|
||||
There is as of yet no query planner - not even segmenting rules and tuples by
|
||||
relation to restrict evaluation. This means that the complexity of a query is
|
||||
`O(dataset * term count)`, which is clearly less than ideal.
|
||||
|
||||
## License
|
||||
|
||||
Mirrored from https://git.arrdem.com/arrdem/datalog-py
|
||||
|
||||
Published under the MIT license. See [LICENSE.md](LICENSE.md)
|
263
projects/datalog-shell/__main__.py
Executable file
263
projects/datalog-shell/__main__.py
Executable file
|
@ -0,0 +1,263 @@
|
|||
#!/usr/bin/env python3
|
||||
|
||||
__doc__ = f"""
|
||||
Datalog (py)
|
||||
============
|
||||
|
||||
An interactive datalog interpreter with commands and persistence
|
||||
|
||||
Commands
|
||||
~~~~~~~~
|
||||
.help (this message)
|
||||
.all display all tuples
|
||||
.quit to exit the REPL
|
||||
|
||||
To exit, use control-c or control-d
|
||||
|
||||
The interpreter
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
The interpreter reads one line at a time from stdin.
|
||||
Lines are either
|
||||
- definitions (ending in .),
|
||||
- queries (ending in ?)
|
||||
- retractions (ending in !)
|
||||
|
||||
A definition may contain arbitrarily many datalog tuples and rules.
|
||||
|
||||
edge(a, b). edge(b, c). % A pair of definitions
|
||||
⇒ edge(a, b). % The REPL's response that it has been committed
|
||||
⇒ edge(b, c).
|
||||
|
||||
A query may contain definitions, but they exist only for the duration of the query.
|
||||
|
||||
edge(X, Y)? % A query which will enumerate all 2-edges
|
||||
⇒ edge(a, b).
|
||||
⇒ edge(b, c).
|
||||
|
||||
edge(c, d). edge(X, Y)? % A query with a local tuple
|
||||
⇒ edge(a, b).
|
||||
⇒ edge(b, c).
|
||||
⇒ edge(c, d).
|
||||
|
||||
A retraction may contain only one tuple or clause, which will be expunged.
|
||||
|
||||
edge(a, b)! % This tuple is in our dataset
|
||||
⇒ edge(a, b) % So deletion succeeds
|
||||
|
||||
edge(a, b)! % This tuple is no longer in our dataset
|
||||
⇒ Ø % So deletion fails
|
||||
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import logging
|
||||
import sys
|
||||
|
||||
from datalog.debris import Timing
|
||||
from datalog.evaluator import select
|
||||
from datalog.reader import pr_str, read_command, read_dataset
|
||||
from datalog.types import (
|
||||
CachedDataset,
|
||||
Constant,
|
||||
Dataset,
|
||||
LVar,
|
||||
PartlyIndexedDataset,
|
||||
Rule,
|
||||
TableIndexedDataset
|
||||
)
|
||||
|
||||
from prompt_toolkit import print_formatted_text, prompt, PromptSession
|
||||
from prompt_toolkit.formatted_text import FormattedText
|
||||
from prompt_toolkit.history import FileHistory
|
||||
from prompt_toolkit.styles import Style
|
||||
from yaspin import Spinner, yaspin
|
||||
|
||||
|
||||
STYLE = Style.from_dict({
|
||||
# User input (default text).
|
||||
"": "",
|
||||
"prompt": "ansigreen",
|
||||
"time": "ansiyellow"
|
||||
})
|
||||
|
||||
SPINNER = Spinner(["|", "/", "-", "\\"], 200)
|
||||
|
||||
|
||||
class InterpreterInterrupt(Exception):
|
||||
"""An exception used to break the prompt or evaluation."""
|
||||
|
||||
|
||||
def print_(fmt, **kwargs):
|
||||
print_formatted_text(FormattedText(fmt), **kwargs)
|
||||
|
||||
|
||||
def print_db(db):
|
||||
"""Render a database for debugging."""
|
||||
|
||||
for e in db.tuples():
|
||||
print(f"⇒ {pr_str(e)}")
|
||||
|
||||
for r in db.rules():
|
||||
print(f"⇒ {pr_str(r)}")
|
||||
|
||||
|
||||
def main(args):
|
||||
"""REPL entry point."""
|
||||
|
||||
if args.db_cls == "simple":
|
||||
db_cls = Dataset
|
||||
elif args.db_cls == "cached":
|
||||
db_cls = CachedDataset
|
||||
elif args.db_cls == "table":
|
||||
db_cls = TableIndexedDataset
|
||||
elif args.db_cls == "partly":
|
||||
db_cls = PartlyIndexedDataset
|
||||
|
||||
print(f"Using dataset type {db_cls}")
|
||||
|
||||
session = PromptSession(history=FileHistory(".datalog.history"))
|
||||
db = db_cls([], [])
|
||||
|
||||
if args.dbs:
|
||||
for db_file in args.dbs:
|
||||
try:
|
||||
with open(db_file, "r") as f:
|
||||
db = db.merge(read_dataset(f.read()))
|
||||
print(f"Loaded {db_file} ...")
|
||||
except Exception as e:
|
||||
print("Internal error - {e}")
|
||||
print(f"Unable to load db {db_file}, skipping")
|
||||
|
||||
while True:
|
||||
try:
|
||||
line = session.prompt([("class:prompt", ">>> ")], style=STYLE)
|
||||
except (InterpreterInterrupt, KeyboardInterrupt):
|
||||
continue
|
||||
except EOFError:
|
||||
break
|
||||
|
||||
if line == ".all":
|
||||
op = ".all"
|
||||
elif line == ".dbg":
|
||||
op = ".dbg"
|
||||
elif line == ".quit":
|
||||
break
|
||||
|
||||
elif line in {".help", "help", "?", "??", "???"}:
|
||||
print(__doc__)
|
||||
continue
|
||||
|
||||
elif line.split(" ")[0] == ".log":
|
||||
op = ".log"
|
||||
|
||||
else:
|
||||
try:
|
||||
op, val = read_command(line)
|
||||
except Exception as e:
|
||||
print(f"Got an unknown command or syntax error, can't tell which")
|
||||
continue
|
||||
|
||||
# Definition merges on the DB
|
||||
if op == ".all":
|
||||
print_db(db)
|
||||
|
||||
# .dbg drops to a debugger shell so you can poke at the instance objects (database)
|
||||
elif op == ".dbg":
|
||||
import pdb
|
||||
pdb.set_trace()
|
||||
|
||||
# .log sets the log level - badly
|
||||
elif op == ".log":
|
||||
level = line.split(" ")[1].upper()
|
||||
try:
|
||||
ch.setLevel(getattr(logging, level))
|
||||
except BaseException:
|
||||
print(f"Unknown log level {level}")
|
||||
|
||||
elif op == ".":
|
||||
# FIXME (arrdem 2019-06-15):
|
||||
# Syntax rules the parser doesn't impose...
|
||||
try:
|
||||
for rule in val.rules():
|
||||
assert not rule.free_vars, f"Rule contains free variables {rule.free_vars!r}"
|
||||
|
||||
for tuple in val.tuples():
|
||||
assert not any(isinstance(e, LVar) for e in tuple), f"Tuples cannot contain lvars - {tuple!r}"
|
||||
|
||||
except BaseException as e:
|
||||
print(f"Error: {e}")
|
||||
continue
|
||||
|
||||
db = db.merge(val)
|
||||
print_db(val)
|
||||
|
||||
# Queries execute - note that rules as queries have to be temporarily merged.
|
||||
elif op == "?":
|
||||
# In order to support ad-hoc rules (joins), we have to generate a transient "query" database
|
||||
# by bolting the rule on as an overlay to the existing database. If of course we have a join.
|
||||
#
|
||||
# `val` was previously assumed to be the query pattern. Introduce `qdb`, now used as the
|
||||
# database to query and "fix" `val` to be the temporary rule's pattern.
|
||||
#
|
||||
# We use a new db and db local so that the ephemeral rule doesn't persist unless the user
|
||||
# later `.` defines it.
|
||||
#
|
||||
# Unfortunately doing this merge does nuke caches.
|
||||
qdb = db
|
||||
if isinstance(val, Rule):
|
||||
qdb = db.merge(db_cls([], [val]))
|
||||
val = val.pattern
|
||||
|
||||
with yaspin(SPINNER) as spinner:
|
||||
with Timing() as t:
|
||||
try:
|
||||
results = list(select(qdb, val))
|
||||
except KeyboardInterrupt:
|
||||
print(f"Evaluation aborted after {t}")
|
||||
continue
|
||||
|
||||
# It's kinda bogus to move sorting out but oh well
|
||||
sorted(results)
|
||||
|
||||
for _results, _bindings in results:
|
||||
_result = _results[0] # select only selects one tuple at a time
|
||||
print(f"⇒ {pr_str(_result)}")
|
||||
|
||||
# So we can report empty sets explicitly.
|
||||
if not results:
|
||||
print("⇒ Ø")
|
||||
|
||||
print_([("class:time", f"Elapsed time - {t}")], style=STYLE)
|
||||
|
||||
# Retractions try to delete, but may fail.
|
||||
elif op == "!":
|
||||
if val in db.tuples() or val in [r.pattern for r in db.rules()]:
|
||||
db = db_cls([u for u in db.tuples() if u != val],
|
||||
[r for r in db.rules() if r.pattern != val])
|
||||
print(f"⇒ {pr_str(val)}")
|
||||
else:
|
||||
print("⇒ Ø")
|
||||
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
# Select which dataset type to use
|
||||
parser.add_argument("--db-type",
|
||||
choices=["simple", "cached", "table", "partly"],
|
||||
help="Choose which DB to use (default partly)",
|
||||
dest="db_cls",
|
||||
default="partly")
|
||||
|
||||
parser.add_argument("--load-db", dest="dbs", action="append",
|
||||
help="Datalog files to load first.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
args = parser.parse_args(sys.argv[1:])
|
||||
logger = logging.getLogger("arrdem.datalog")
|
||||
ch = logging.StreamHandler()
|
||||
ch.setLevel(logging.INFO)
|
||||
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
|
||||
ch.setFormatter(formatter)
|
||||
logger.addHandler(ch)
|
||||
main(args)
|
35
projects/datalog-shell/setup.py
Normal file
35
projects/datalog-shell/setup.py
Normal file
|
@ -0,0 +1,35 @@
|
|||
from setuptools import setup
|
||||
|
||||
|
||||
setup(
|
||||
name="arrdem.datalog.shell",
|
||||
# Package metadata
|
||||
version="0.0.2",
|
||||
license="MIT",
|
||||
description="A shell for my datalog engine",
|
||||
long_description=open("README.md").read(),
|
||||
long_description_content_type="text/markdown",
|
||||
author="Reid 'arrdem' McKenzie",
|
||||
author_email="me@arrdem.com",
|
||||
url="https://git.arrdem.com/arrdem/datalog-shell",
|
||||
classifiers=[
|
||||
"License :: OSI Approved :: MIT License",
|
||||
"Development Status :: 3 - Alpha",
|
||||
"Intended Audience :: Developers",
|
||||
"Topic :: Database",
|
||||
"Topic :: Database :: Database Engines/Servers",
|
||||
"Topic :: Database :: Front-Ends",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3.6",
|
||||
"Programming Language :: Python :: 3.7",
|
||||
],
|
||||
|
||||
scripts=[
|
||||
"bin/datalog"
|
||||
],
|
||||
install_requires=[
|
||||
"arrdem.datalog~=2.0.0",
|
||||
"prompt_toolkit==2.0.9",
|
||||
"yaspin==0.14.3",
|
||||
],
|
||||
)
|
|
@ -8,20 +8,27 @@ autoflake==1.4
|
|||
Babel==2.9.0
|
||||
beautifulsoup4==4.9.3
|
||||
black==20.8b1
|
||||
bleach==3.3.0
|
||||
certifi==2020.12.5
|
||||
cffi==1.14.5
|
||||
chardet==4.0.0
|
||||
click==7.1.2
|
||||
colorama==0.4.4
|
||||
commonmark==0.9.1
|
||||
coverage==5.5
|
||||
cryptography==3.4.7
|
||||
docutils==0.17
|
||||
idna==2.10
|
||||
imagesize==1.2.0
|
||||
importlib-metadata==4.0.1
|
||||
iniconfig==1.1.1
|
||||
isodate==0.6.0
|
||||
isort==5.8.0
|
||||
jedi==0.18.0
|
||||
jeepney==0.6.0
|
||||
Jinja2==2.11.3
|
||||
jsonschema==3.2.0
|
||||
keyring==23.0.1
|
||||
livereload==2.6.3
|
||||
lxml==4.6.3
|
||||
m2r==0.2.1
|
||||
|
@ -35,10 +42,12 @@ openapi-spec-validator==0.3.0
|
|||
packaging==20.9
|
||||
parso==0.8.2
|
||||
pathspec==0.8.1
|
||||
pkginfo==1.7.0
|
||||
pluggy==0.13.1
|
||||
prompt-toolkit==3.0.18
|
||||
pudb==2020.1
|
||||
py==1.10.0
|
||||
pycparser==2.20
|
||||
pyflakes==2.3.1
|
||||
Pygments==2.8.1
|
||||
pyparsing==2.4.7
|
||||
|
@ -48,10 +57,14 @@ pytest-cov==2.11.1
|
|||
pytest-pudb==0.7.0
|
||||
pytz==2021.1
|
||||
PyYAML==5.4.1
|
||||
readme-renderer==29.0
|
||||
recommonmark==0.7.1
|
||||
redis==3.5.3
|
||||
regex==2021.4.4
|
||||
requests==2.25.1
|
||||
requests-toolbelt==0.9.1
|
||||
rfc3986==1.5.0
|
||||
SecretStorage==3.3.1
|
||||
six==1.15.0
|
||||
snowballstemmer==2.1.0
|
||||
soupsieve==2.2.1
|
||||
|
@ -67,6 +80,8 @@ sphinxcontrib-qthelp==1.0.3
|
|||
sphinxcontrib-serializinghtml==1.1.4
|
||||
toml==0.10.2
|
||||
tornado==6.1
|
||||
tqdm==4.60.0
|
||||
twine==3.4.1
|
||||
typed-ast==1.4.2
|
||||
typing-extensions==3.7.4.3
|
||||
unify==0.5
|
||||
|
@ -74,5 +89,8 @@ untokenize==0.1.1
|
|||
urllib3==1.26.4
|
||||
urwid==2.1.2
|
||||
wcwidth==0.2.5
|
||||
webencodings==0.5.1
|
||||
yamllint==1.26.1
|
||||
yarl==1.6.3
|
||||
yaspin==1.5.0
|
||||
zipp==3.4.1
|
||||
|
|
Loading…
Reference in a new issue