Initial zapp state (#1)

This commit implements zapp! and rules_zapp, as a proof of concept of
lighter-weight easily hacked on self-extracting zipapps. Think Pex or
Shiv but with less behavior and more user control. Or at least
hackability.
This commit is contained in:
Reid D McKenzie 2021-08-08 00:16:37 -06:00 committed by GitHub
parent 245f4a0cba
commit a6d15bfc83
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
11 changed files with 802 additions and 0 deletions

View file

@ -8,6 +8,7 @@ And so I'm going the other way; Bazel in a monorepo with several subprojects so
- [Datalog](projects/datalog)
- [Flowmetal](projects/flowmetal)
- [YAML Schema](projects/yamlschema)
- [zapp](projects/zapp)
## License

32
projects/zapp/BUILD Normal file
View file

@ -0,0 +1,32 @@
package(default_visibility = ["//visibility:public"])
load("zapp.bzl",
"zapp_binary",
)
# Bootstrapping Zapp using py_binary
py_binary(
name = "zappc",
main = "src/python/zapp/compiler/__main__.py",
imports = [
"src/python",
],
)
# Zapp plugins used as a runtime library by rules_zapp
py_library(
name = "zapp_support",
srcs = glob(["src/python/zapp/support/**/*.py"]),
imports = [
"src/python",
],
)
# Zapped zapp
zapp_binary(
name = "zapzap",
main = "src/python/zapp/__main__.py",
imports = [
"src/python",
],
)

124
projects/zapp/README.md Normal file
View file

@ -0,0 +1,124 @@
# Zapp
<img align="right" src="zapp.jpg" alt="Spaceman spiff sets his zorcher to shake and bake" width=250>
Zapp is a comically-named tool for making Python [zipapps](https://www.python.org/dev/peps/pep-0441/).
Zipapps or zapps as we call them (hence the raygun theme) are packagings of Python programs into zip files. It's
comparable to [Pex](https://github.com/pantsbuild/pex/), [Subpar](https://github.com/google/subpar/) and
[Shiv](https://github.com/linkedin/shiv/) in intent, but shares the most with Subpar in particulars as like subpar Zapp
is designed for use with Bazel (and is co-developed with appropriate Bazel build rules).
## A quick overview of zipapps
A Python zipapp is a file with two parts - a "plain" text file with a "shebang" specifying a Python interpreter, followed by a ZIP formatted archive after the newline.
This is (for better or worse) a valid ZIP format archive, as the specification does not preclude prepended data.
When Python encounters a zipapp, it assumes you meant `PYTHONPATH=your.zip <shebang> -m __main__`.
See [the upstream docs](https://docs.python.org/3/library/zipapp.html#the-python-zip-application-archive-format).
So not only must `zapp` generate a prefix script, it needs to insert a `__main__.py` that'll to your application.
## A quick overview of zapp
Zapp is really two artifacts - `zapp.bzl` which defines `rules_python` (`zapp_binary`, `zapp_test`) macros and implementations.
These Bazel macros work together with the `zappc` "compiler" to make producing zapps from Bazel convenient.
## A demo
So let's give zapp a spin
``` shellsession
$ cd projects/zapp/examples
$ cat BUILD
load("//projects/zapp:zapp.bzl",
"zapp",
"zapp_binary",
)
# ...
zapp_binary(
name = "hello_deps",
main = "hello.py",
deps = [
py_requirement("pyyaml"),
]
)
```
In this directory there's the `zapp` compiler itself, and a couple of `hello_*` targets that are variously zapped.
One uses `imports`, one is
Let's try `bazel build :hello`
``` shellsession
$ bazel build :hello_deps
bazel build :hello_deps
INFO: Analyzed target //projects/zapp/example:hello_deps (22 packages loaded, 70 targets configured).
INFO: Found 1 target...
INFO: From Building zapp file //projects/zapp/example:hello_deps:
{'manifest': {'entry_point': 'projects.zapp.example.hello',
'prelude_points': ['zapp.support.unpack:unpack_deps'],
'shebang': '/usr/bin/env python3',
'sources': {'__init__.py': None,
'projects/__init__.py': None,
'projects/zapp/__init__.py': None,
'projects/zapp/example/__init__.py': None,
'projects/zapp/example/hello.py': 'projects/zapp/example/hello.py',
'zapp/__init__.py': None,
'zapp/manifest.json': 'bazel-out/k8-fastbuild/bin/projects/zapp/example/hello_deps.zapp-manifest.json',
'zapp/support/__init__.py': None,
'zapp/support/manifest.py': 'projects/zapp/src/python/zapp/support/manifest.py',
'zapp/support/unpack.py': 'projects/zapp/src/python/zapp/support/unpack.py'},
'wheels': {'PyYAML-5.4.1-cp39-cp39-manylinux1_x86_64.whl':
{'hashes': [],
'source': 'external/arrdem_source_pypi/pypi__pyyaml/PyYAML-5.4.1-cp39-cp39-manylinux1_x86_64.whl'}},
'zip_safe': True},
'opts': {'debug': True,
'manifest': 'bazel-out/k8-fastbuild/bin/projects/zapp/example/hello_deps.zapp-manifest.json',
'output': 'bazel-out/k8-fastbuild/bin/projects/zapp/example/hello_deps'}}
Target //projects/zapp/example:hello_deps up-to-date:
bazel-bin/projects/zapp/example/hello_deps
INFO: Elapsed time: 0.497s, Critical Path: 0.13s
INFO: 8 processes: 7 internal, 1 linux-sandbox.
INFO: Build completed successfully, 8 total actions
```
Here, I've got the `zapp` compiler configured to debug what it's doing.
This is a bit unusual, but it's convenient for peeking under the hood.
The manifest which `zapp` consumes describes the relocation of files (and wheels, more on that in a bit) from the Bazel source tree per python `import = [...]` specifiers to locations in the container/logical filesystem within the zip archive.
We can see that the actual `hello.py` file (known as `projects/zapp/hello.py` within the repo) is being mapped into the zip archive without relocation.
We can also see that a `PyYAML` wheel is marked for inclusion in the archive.
If we run the produced zipapp -
``` shellsession
$ bazel run :hello_deps
INFO: Analyzed target //projects/zapp/example:hello_deps (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //projects/zapp/example:hello_deps up-to-date:
bazel-bin/projects/zapp/example/hello_deps
INFO: Elapsed time: 0.068s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
- /home/arrdem/.cache/zapp/wheels/PyYAML-5.4.1-cp39-cp39-manylinux1_x86_64.whl
- /home/arrdem/.cache/bazel/_bazel_arrdem/6259d2555f41e1db0292a7d7f00f78ca/execroot/arrdem_source/bazel-out/k8-fastbuild/bin/projects/zapp/example/hello_deps
- /usr/lib/python39.zip
- /usr/lib/python3.9
- /usr/lib/python3.9/lib-dynload
- /home/arrdem/.virtualenvs/source/lib/python3.9/site-packages
hello, world!
I have YAML! and nothing to do with it. /home/arrdem/.cache/zapp/wheels/PyYAML-5.4.1-cp39-cp39-manylinux1_x86_64.whl/yaml/__init__.py
```
Here we can see that zapp when executed unpacked the wheel into a cache, inserted that cached wheel into the `sys.path`, and correctly delegated to our `hello.py` script, which was able to `import yaml` from the packaged wheel! 🎉
## License
Copyright Reid 'arrdem' McKenzie August 2021.
Published under the terms of the MIT license.

View file

@ -0,0 +1,18 @@
load("//projects/zapp:zapp.bzl",
"zapp_binary",
)
zapp_binary(
name = "hello_script",
main = "hello.py",
# entry_point is inferred from main =
)
zapp_binary(
name = "hello_deps",
main = "hello.py",
# deps also get zapped via their underlying wheels
deps = [
py_requirement("pyyaml"),
]
)

View file

@ -0,0 +1,18 @@
import sys
def main():
for e in sys.path:
print(" -", e)
print("hello, world!")
try:
import yaml
print("I have YAML! and nothing to do with it.", yaml.__file__)
except ImportError:
print("Don't have YAML.")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,133 @@
"""
The Zapp compiler.
"""
import argparse
import io
import json
import os
import sys
import zipfile
import pathlib
import stat
parser = argparse.ArgumentParser(description="The (bootstrap) Zapp compiler")
parser.add_argument("-o", "--out", dest="output", help="Output target file")
parser.add_argument("-d", "--debug", dest="debug", action="store_true", default=False)
parser.add_argument("manifest", help="The (JSON) manifest")
MAIN_TEMPLATE = """\
# -*- coding: utf-8 -*-
\"\"\"Zapp-generated __main__\""\"
from importlib import import_module
# FIXME: This is absolutely implementation details.
# Execing would be somewhat nicer
from runpy import _run_module_as_main
for script in {scripts!r}:
print(script)
mod, sep, fn = script.partition(':')
mod_ok = all(part.isidentifier() for part in mod.split('.'))
fn_ok = all(part.isidentifier() for part in fn.split('.'))
if not mod_ok:
raise RuntimeError("Invalid module reference {{!r}}".format(mod))
if fn and not fn_ok:
raise RuntimeError("Invalid function reference {{!r}}".format(fn))
if mod and fn and False:
mod = import_module(mod)
getattr(mod, fn)()
else:
_run_module_as_main(mod)
"""
def make_dunder_main(manifest):
"""Generate a __main__.py file for the given manifest."""
prelude = manifest.get("prelude_points", [])
main = manifest.get("entry_point")
scripts = prelude + [main]
return MAIN_TEMPLATE.format(**locals())
def dir_walk_prefixes(path):
"""Helper. Walk all slices of a path."""
segments = []
yield ""
for segment in path.split("/"):
segments.append(segment)
yield os.path.join(*segments)
def generate_dunder_inits(manifest):
"""Hack the manifest to insert __init__ files as needed."""
sources = manifest["sources"]
for input_file in list(sources.keys()):
for path in dir_walk_prefixes(os.path.dirname(input_file)):
init_file = os.path.join(path, "__init__.py")
if init_file not in sources:
sources[init_file] = ""
return manifest
def generate_manifest(opts, manifest):
"""Insert the manifest.json file."""
manifest["sources"]["zapp/manifest.json"] = opts.manifest
return manifest
def main():
opts, args = parser.parse_known_args()
with open(opts.manifest) as fp:
manifest = json.load(fp)
manifest = generate_manifest(opts, manifest)
# Patch the manifest to insert needed __init__ files
# NOTE: This has to be the LAST thing we do
manifest = generate_dunder_inits(manifest)
if opts.debug:
from pprint import pprint
pprint({
"opts": {k: getattr(opts, k) for k in dir(opts) if not k.startswith("_")},
"manifest": manifest
})
with open(opts.output, 'w') as zapp:
shebang = "#!" + manifest["shebang"] + "\n"
zapp.write(shebang)
# Now we're gonna build the zapp from the manifest
with zipfile.ZipFile(opts.output, 'a') as zapp:
# Append the __main__.py generated record
zapp.writestr("__main__.py", make_dunder_main(manifest))
# Append user-specified sources
for dest, src in manifest["sources"].items():
if src == "":
zapp.writestr(dest, "")
else:
zapp.write(src, dest)
# Append user-specified libraries
# FIXME
zapp = pathlib.Path(opts.output)
zapp.chmod(zapp.stat().st_mode | stat.S_IEXEC)
if __name__ == "__main__" or 1:
main()

View file

@ -0,0 +1,153 @@
"""
The Zapp compiler.
"""
import argparse
import io
import json
import os
import sys
import zipfile
import pathlib
import stat
parser = argparse.ArgumentParser(description="The (bootstrap) Zapp compiler")
parser.add_argument("-o", "--out", dest="output", help="Output target file")
parser.add_argument("-d", "--debug", dest="debug", action="store_true", default=False)
parser.add_argument("manifest", help="The (JSON) manifest")
MAIN_TEMPLATE = """\
# -*- coding: utf-8 -*-
\"\"\"Zapp-generated __main__\""\"
from importlib import import_module
import os
import sys
# FIXME: This is absolutely implementation details.
# Execing would be somewhat nicer
from runpy import _run_module_as_main
for script in {scripts!r}:
mod, sep, fn = script.partition(':')
mod_ok = all(part.isidentifier() for part in mod.split('.'))
fn_ok = all(part.isidentifier() for part in fn.split('.'))
if not mod_ok:
raise RuntimeError("Invalid module reference {{!r}}".format(mod))
if fn and not fn_ok:
raise RuntimeError("Invalid function reference {{!r}}".format(fn))
if mod and fn:
mod = import_module(mod)
getattr(mod, fn)()
else:
_run_module_as_main(mod)
"""
def make_dunder_main(manifest):
"""Generate a __main__.py file for the given manifest."""
prelude = manifest.get("prelude_points", [])
main = manifest.get("entry_point")
scripts = prelude + [main]
return MAIN_TEMPLATE.format(**locals())
def dir_walk_prefixes(path):
"""Helper. Walk all slices of a path."""
segments = []
yield ""
for segment in path.split("/"):
segments.append(segment)
yield os.path.join(*segments)
def generate_dunder_inits(manifest):
"""Hack the manifest to insert __init__ files as needed."""
sources = manifest["sources"]
for input_file in list(sources.keys()):
for path in dir_walk_prefixes(os.path.dirname(input_file)):
init_file = os.path.join(path, "__init__.py")
if init_file not in sources:
sources[init_file] = None
return manifest
def insert_manifest_json(opts, manifest):
"""Insert the manifest.json file."""
manifest["sources"]["zapp/manifest.json"] = opts.manifest
return manifest
def enable_unzipping(manifest):
"""Inject unzipping behavior as needed."""
if manifest["wheels"]:
manifest["prelude_points"].append("zapp.support.unpack:unpack_deps")
# FIXME:
# if not manifest["zip_safe"]:
# enable a similar injection for unzipping
return manifest
def main():
opts, args = parser.parse_known_args()
with open(opts.manifest) as fp:
manifest = json.load(fp)
manifest = insert_manifest_json(opts, manifest)
manifest = enable_unzipping(manifest)
# Patch the manifest to insert needed __init__ files
# NOTE: This has to be the LAST thing we do
manifest = generate_dunder_inits(manifest)
if opts.debug:
from pprint import pprint
pprint({
"opts": {k: getattr(opts, k) for k in dir(opts) if not k.startswith("_")},
"manifest": manifest
})
with open(opts.output, 'w') as zapp:
shebang = "#!" + manifest["shebang"] + "\n"
zapp.write(shebang)
if "__main__.py" in manifest["sources"]:
print("Error: __main__.py conflict.", file=sys.stderr)
exit(1)
# Now we're gonna build the zapp from the manifest
with zipfile.ZipFile(opts.output, 'a') as zapp:
# Append the __main__.py generated record
zapp.writestr("__main__.py", make_dunder_main(manifest))
# Append user-specified sources
for dest, src in sorted(manifest["sources"].items(),
key=lambda x: x[0]):
if src is None:
zapp.writestr(dest, "")
else:
zapp.write(src, dest)
# Append user-specified libraries
for whl, config in manifest["wheels"].items():
zapp.write(config["source"], ".deps/" + whl)
zapp = pathlib.Path(opts.output)
zapp.chmod(zapp.stat().st_mode | stat.S_IEXEC)
if __name__ == "__main__" or 1:
main()

View file

@ -0,0 +1,17 @@
"""The Zapp runtime manifest API."""
from copy import deepcopy
from importlib.resources import open_text
import json
with open_text("zapp", "manifest.json") as fp:
_MANIFEST = json.load(fp)
def manifest():
"""Return (a copy) of the runtime manifest."""
return deepcopy(_MANIFEST)
__all__ = ["manifest"]

View file

@ -0,0 +1,57 @@
"""Conditionally unpack a zapp (and its deps)."""
import sys
import os
from pathlib import Path
from zipfile import ZipFile
from .manifest import manifest
MANIFEST = manifest()
def cache_root() -> Path:
return Path(os.path.join(os.path.expanduser("~"))) / ".cache" / "zapp"
def cache_wheel_root():
return cache_root() / "wheels"
def cache_wheel_path(wheel: str) -> Path:
return cache_wheel_root() / wheel
def cache_zapp_root():
return cache_root() / "zapps"
def cache_zapp_path(fingerprint):
return cache_zapp_root() / fingerprint
def unpack_deps():
"""Unpack deps, populating and updating the host's cache."""
# Create the cache dir as needed
cache_wheel_root().mkdir(parents=True, exist_ok=True)
# For each wheel, touch the existing cached wheel or unpack this one.
with ZipFile(sys.argv[0], "r") as zf:
for whl, config in MANIFEST["wheels"].items():
cached_whl = cache_wheel_path(whl)
if cached_whl.exists():
cached_whl.touch()
else:
with open(cached_whl, "wb") as of:
of.write(zf.read(".deps/" + whl))
sys.path.insert(0, str(cached_whl))
def main():
"""Inspect the manifest."""
unpack_deps()

249
projects/zapp/zapp.bzl Normal file
View file

@ -0,0 +1,249 @@
"""
An implementation of driving zappc from Bazel.
"""
load("@rules_python//python:defs.bzl", "py_library", "py_binary")
DEFAULT_COMPILER = "//projects/zapp:zappc"
DEFAULT_RUNTIME = "//projects/zapp:zapp_support"
def _store_path(path, ctx, imports):
"""Given a path, prepend the workspace name as the zappent directory"""
# It feels like there should be an easier, less fragile way.
if path.startswith("../"):
# External workspace, for example
# '../protobuf/python/google/protobuf/any_pb2.py'
stored_path = path[len("../"):]
elif path.startswith("external/"):
# External workspace, for example
# 'external/protobuf/python/__init__.py'
stored_path = path[len("external/"):]
else:
# Main workspace, for example 'mypackage/main.py'
# stored_path = ctx.workspace_name + "/" + path
stored_path = path
matching_prefix = None
for i in imports:
if stored_path.startswith(i):
stored_path = stored_path[len(i):]
matching_prefix = i
break
stored_path = stored_path.lstrip("/")
return stored_path
def _check_script(point, sources_map):
"""Check that a given 'script' (eg. module:fn ref.) maps to a file in sources."""
fname = point.split(":")[0].replace(".", "/") + ".py"
if fname not in sources_map:
fail("Point %s (%s) is not a known source!" % (fname, sources_map))
def _zapp_impl(ctx):
"""Implementation of zapp() rule"""
# TODO: Take wheels and generate a .deps/ tree of them, filtering whl/pypi source files from srcs
whls = []
for lib in ctx.attr.wheels:
for f in lib.data_runfiles.files.to_list():
whls.append(f)
# TODO: also handle ctx.attr.src.data_runfiles.symlinks
srcs = [
f for f in ctx.attr.src.default_runfiles.files.to_list()
]
# Find the list of directories to add to sys
import_roots = [
r.replace(ctx.workspace_name + "/", "", 1)
for r in ctx.attr.src[PyInfo].imports.to_list()
]
for r0 in import_roots:
for r1 in import_roots:
if r0 == r1:
continue
elif r0.startswith(r1):
fail("Import root conflict between %s and %s" % r0, r1)
# Dealing with main
main_py_file = ctx.files.main
main_py_ref = ctx.attr.entry_point
if main_py_ref and main_py_file:
fail("Only one of `main` or `entry_point` should be specified")
elif main_py_ref:
# Compute a main module
main_py_file = main_py_ref.split(":")[0].replace(".", "/") + ".py"
elif main_py_file:
# Compute a main module reference
if len(main_py_file) > 1:
fail("Expected exactly one .py file, found these: %s" % main_py_file)
main_py_file = main_py_file[0]
if main_py_file not in ctx.attr.src.data_runfiles.files.to_list():
fail("Main entry point [%s] not listed in srcs" % main_py_file, "main")
# Compute the -m <> equivalent for the 'main' module
main_py_ref = _store_path(main_py_file.path, ctx, import_roots).replace(".py", "").replace("/", ".")
# Make a manifest of files to store in the .zapp file. The
# runfiles manifest is not quite right, so we make our own.
sources_map = {}
# Now add the regular (source and generated) files
for input_file in srcs:
stored_path = _store_path(input_file.short_path, ctx, import_roots)
if stored_path:
local_path = input_file.path
if stored_path in sources_map and sources_map[stored_path] != '':
fail("File path conflict between %s and %s" % sources_map[stored_path], local_path)
sources_map[stored_path] = local_path
_check_script(main_py_ref, sources_map)
for p in ctx.attr.prelude_points:
_check_script(p, sources_map)
if "__main__.py" in sources_map:
fail("__main__.py conflict:",
sources_map["__main__.py"],
"conflicts with required generated __main__.py")
# Write the list to the manifest file
manifest_file = ctx.actions.declare_file(ctx.label.name + ".zapp-manifest.json")
ctx.actions.write(
output = manifest_file,
content = json.encode({
"shebang": ctx.attr.shebang,
"sources": sources_map,
"zip_safe": ctx.attr.zip_safe,
"prelude_points": ctx.attr.prelude_points,
"entry_point": main_py_ref,
"wheels": {w.path.split("/")[-1]: {"hashes": [], "source": w.path} for w in whls},
}),
is_executable = False,
)
# Run compiler
ctx.actions.run(
inputs = [
manifest_file,
] + srcs + whls,
tools = [],
outputs = [ctx.outputs.executable],
progress_message = "Building zapp file %s" % ctx.label,
executable = ctx.executable.compiler,
arguments = [
"--debug",
"-o", ctx.outputs.executable.path,
manifest_file.path
],
mnemonic = "PythonCompile",
use_default_shell_env = True,
)
# .zapp file itself has no runfiles and no providers
return []
zapp = rule(
attrs = {
"src": attr.label(mandatory = True),
"main": attr.label(allow_single_file = True),
"wheels": attr.label_list(),
"entry_point": attr.string(),
"prelude_points": attr.string_list(),
"compiler": attr.label(
default = Label(DEFAULT_COMPILER),
executable = True,
cfg = "host",
),
"shebang": attr.string(default = "/usr/bin/env python3"),
"zip_safe": attr.bool(default = True),
"root_import": attr.bool(default = False),
},
executable = True,
implementation = _zapp_impl,
)
def zapp_binary(name,
main=None,
entry_point=None,
prelude_points=[],
deps=[],
imports=[],
test=False,
compiler=None,
zip_safe=True,
**kwargs):
"""A self-contained, single-file Python program, with a .zapp file extension.
Args:
Same as py_binary, but accepts some extra args -
entry_point:
The script to run as the main.
prelude_points:
Additional scripts (zapp middlware) to run before main.
compiler:
Lable identifying the zapp compiler to use. You shouldn't need to change this.
zip_safe:
Whether to import Python code and read datafiles directly from the zip
archive. Otherwise, if False, all files are extracted to a temporary
directory on disk each time the zapp file executes.
"""
srcs = kwargs.pop("srcs", [])
if main and main not in srcs:
srcs.append(main)
whls = []
src_deps = []
for d in deps:
if d.find("//pypi__") != -1:
whls.append(d + ":whl")
else:
src_deps.append(d)
py_library(
name = name + ".whls",
data = whls,
)
py_library(
name = name + ".lib",
srcs = srcs,
deps = (src_deps or []) + [DEFAULT_RUNTIME],
imports = imports,
**kwargs
)
zapp(
name = name,
src = name + ".lib",
compiler = compiler,
main = main,
entry_point = entry_point,
prelude_points = prelude_points,
zip_safe = zip_safe,
wheels = [name + ".whls"],
)
def zapp_test(name, **kwargs):
"""Same as zapp_binary, just sets the test=True bit."""
kwargs.pop("test")
zapp_binary(name, test=True, **kwargs)

BIN
projects/zapp/zapp.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB