Skip to content

Build all software in one command with Bazel

Posted on:October 11, 2023 at 11:15 AM

We’ll be looking at the build system called Bazel today. Bazel is developed at Google and it has been used internally for quite some time now as the primary build system.

In this guide, we’ll create a build flow that fetches remote dependencies from the Internet, generates graphics using Python, embeds them into a Go binary, and then enables us to serve those pictures through that Go binary over HTTP — all in one command.

At the time of writing this article, I am a Google employee and I may be biased here, but I’ll still say it — Bazel is one of my favorite projects developed by Google. I usually stay away from tooling that tries to be overly generic and solve a lot of things simultaneously, however, Bazel really does solve them, and it solves them really well. At its core, Bazel is more or less language agnostic, and it can definitely be easily configured to run basically arbitrarily complex workflows that can be expressed through a dependency graph, building obviously being the primary purpose here.

This article sticks out among my recent articles that have been primarily about low-level software and hacking around Linux. I promise Bazel will stay relevant here, but I will follow up with that in another article. In this article, I will stick to Bazel in general and how to use it to build pretty much anything and leverage some of the Bazel’s biggest strengths such as byte-for-byte reproducibility.

Let’s get started.

Table of contents

Open Table of contents

Who uses Bazel?

If you’ve never heard of Bazel before, you may be wondering if this is worth your time and if Bazel is production ready. The answer is clear — it is more than production ready, and it is already powering a lot of mission critical projects in tech.

Google is obviously the first name that appears as the user here, since Bazel is developed in-house. However, lots of other big names such as Adobe, Canva, Dropbox, Lyft, Nvidia, etc. are listed on the users page.

Although it is mostly language agnostic and really doesn’t impose any project structure on you, Bazel is still a tiny bit opinionated on how some things are done, and we’ll see that as we go. The primary use case is probably, due to its origins at Google, a monorepo with different technologies mixed in. We’ll see how Bazel gives you a pretty uniform way of handling all the different languages and builds under the hood, but the point here is — if you want to build a very small project with Bazel, it may be an overkill and you may not think it’s worth your time to do all the initial ceremony that Bazel requires. However, on any sort of a multi-week project and above, with at least 2 technologies mixed in, I personally believe Bazel is worth your time. At the very least, it should be relatively simple to learn (so hold on to this article!) and it can come in handy at least at some point if you’re a software engineer.

Advantages

Bazel typically advertises 3 main ideas as the selling points:

Disadvantages

Nothing is perfect, and neither is Bazel, so here are some of the things that I have identified as not so great:

Bazelisk: use Bazel without installing anything!

Let’s do something useful with Bazel. Of course, you probably think we have to install Bazel first, but we don’t! The Bazel team offers a tool called Bazelisk which is a user-friendly wrapper/launcher for Bazel. It’s a statically linked binary that you can just download from GitHub and use it as a drop-in replacement for Bazel. What it does is it fetches the actual Bazel into a background location and passes all the commands down to it. The Bazelisk binary is very tiny, so much so that you can really bundle it with your version control system into your project. That way, anyone can use Bazel for your project, and you wouldn’t be making anyone download anything — that’s amazing!

I have included Bazel for Linux/x86-64 and for Mac OS/ARM64 (Apple silicon) into the sample project below so you can get started right from there. If your platform is something else, please navigate to the Bazelisk GitHub repo and download the binary for your platform from the release page here (you can fetch the latest release).

GitHub repo for an example project

Please navigate to the sample GitHub monorepo project for this guide and clone it to your machine. We’ll be using it as an example going forward.

Bazel packages and targets

There are 2 main concepts to be understood with Bazel: packages and targets.

Packages are nothing more than collections of targets, and that usually means targets in the same BUILD file within a directory. Basically, if you have a BUILD file in some directory within your project, that’s where your package begins, and most often, where it ends. Most typically for Bazel projects, you will see one BUILD file in every directory within the directory hierarchy, though it’s not a strict requirement. You can have a BUILD file in one directory, and not in its children, and these children are thus a part of the same Bazel package, though that’s fairly rare. For simplicity, you can go ahead and just make a mental model where a directory corresponds to a Bazel package, and every directory at every level of the hierachy in your project has a BUILD file, even if it’s empty.

Targets are, put simply, nodes in your build graph. The build graph, as you may have guessed, expresses dependencies for whatever you are trying to build and this graph can be arbitrarily deep. Again, intuitively it should be obvious, but this graph cannot have cycles. On top of declaring dependencies, a node also declares what is provides. For example, your target can present a Java library, and it will declare what dependencies it have, but it will also provide a JAR file as its output, and the build flow will know about it. In a more complicated example, your node could be in charge of generating a picture, and the node could declare multiple outputs in multiple different resolutions. The build node will say exactly what it gives.

The targets are defined in the BUILD files, and they are the core of your build flow. Each target’s behavior is defined by a rule and we’ll write some custom rules later to leverage full power of Bazel.

Labels: the identifiers for packages and targets

Packages and targets are identified through labels. Additionally, a common good practice for Bazel (stemming from Google) is to refer to everything within a project by its absolute path. I highly recommend running Bazel always from the root of the project and using the full commands; this makes it much easier to instruct someone else how to reproduce exactly what you’re doing. It may seem a bit tedious at first, but it catches on very quickly.

I vaguely used ‘within a project’ and ‘absolute path’ above, so let me be more specific. You first need to know the root for your project. This is done simply by putting a WORKSPACE file at the directory that you want to serve as the root. That file may as well remain empty, though it may not be super useful for you in that way. For now, just keep in mind that the existence of the WORKSPACE file declares the root of the project, and that is the root package (as you should probably stick a BUILD file in there too, and that one may as well just remain empty; as I said before, you can definitely forever use Bazel by smacking BUILD files in every directory of your project and you’ll be fine). This root is identified by a double slash: //.

Subpackages are then identified just as they are in the filesystem: if you have a BUILD file at /foo/bar/baz, relative to the root of your project, your package is then //foo/bar/baz.

The targets within the package are then identified as concatenation of your package label, a colon (:) and the target name. For example, if in the package //foo/bar/baz there is a BUILD file containing a java_library called hello, the full label for this library is //foo/bar/baz:hello.

Starlark

Bazel builds are configured through a language called Starlark. It’s a dialect of Python, so it should feel fairly familiar right off the bat. When we talk about BUILD, WORKSPACE and bzl files, they are all written in Starlark.

The WORKSPACE file

As we mentioned above, the existence of a WORKSPACE file declares the root of a Bazel project. The configuration you put in this file is typically project-level. Most often, those are the remote dependencies for your project. Many such configurations, remote deps being one of them, can be set up only in the WORKSPACE file, not in the individual BUILD files (Bazel will complain if you do otherwise).

Let’s look at the WORKSPACE file from the sample project on GitHub linked above and look at the important bits:

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

Remember how we mentioned that // signifies the root of the project? Bazel can also reference things from other projects, and to do that, you put the project identifier before the // part. In thise case, we’re using @bazel_tools. This is sort of a virtual project, something that is built into Bazel itself. However, in a second, we will reference another project for real. Let’s first take a look at http_archive.

http_archive(
    name = "rules_python",
    sha256 = "9d04041ac92a0985e344235f5d946f71ac543f1b1565f2cdbc9a2aaee8adf55b",
    strip_prefix = "rules_python-0.26.0",
    url = "https://github.com/bazelbuild/rules_python/releases/download/0.26.0/rules_python-0.26.0.tar.gz",
)

This bit here literally means: download the .tar.gz file from the URL specified, unpack it and move all the files up one level, since at the root of this archive is just a single folder called rules_python-0.26.0. Passing the expected SHA256 hash is optional, but stronly encouraged. If you don’t have it, the reproducibility of your project can easily break if someone simply changes the file at the URL.

What http_archive now makes available for us is a project called @rules_python that can be referenced from now on. So if we want a target from this project that is at //foo/bar:baz, relative to the root of that project, we can reference it as @rules_python//foo/bar:baz.

So what did we just do here? As we mentioned above, rules are there to define to behavior of the build targets. The build targets, i.e. the nodes of the build graph we want to establish here, are for Python. We want to define a Python executable that will do something for us. Some of the Bazel rules are built-in, and some have to be fetched remotely. My feeling is that the Bazel team eventually wants nothing to be built-in and special cased, and that everything can be pieced together from remote dependencies such as the one above. However, now you are slowly getting a feeling for how Bazel supports any language to be used in its build system. You decide which language you want to use, you download the rules for it through the WORKSPACE file (typically this comes from GitHub), and you are free to use that language from that point in your project.

You may wonder where this .tar.gz file gets downloaded and when. First, it will not pollute your source code. It will be put in a temporary working directory whenever you build with Bazel. After all, you really don’t care about seeing that file, you just want to make sure it’s used in your build flow. As for when it is downloaded, the answer is whenever it is not cached. On your first build, you will certainly download it, but from that point it will be available in cache, until the cache goes stale, or you run a clean operation on your project.

Let’s move on and see what happens next:

load("@rules_python//python:repositories.bzl", "py_repositories")

Great, this is referencing the @rules_python project we have just downloaded transparently from GitHub and it loads some .bzl file. If you look really hard, you’ll see that we’re referencing a file, but it looks like a Bazel target… It may not make sense now, but I promise it will make perfect sense as we go deeper and it will reveal to you incredible capabilities of Bazel. For now, you should know that this .bzl file contains some Starlark definitions, such as functions, etc. (which as we said are basically Python functions), and we’re importing the definition for py_repositories, whatever that is.

Next comes:

py_repositories()

It seems like we loaded a function definition above and we’re just calling it here. Simple enough. This is a typical pattern you will see when adding an external project to your Bazel project. They’ll ask you to download the files via http_archive and then call a Starlark function to sort of “click” their project into your project, like assembling Legos. This is mostly to instantiate transitive dependencies for the project you are adding, and I know, this may seem painful, it sort of is, and it is addressed by the Bazel team right now (I’ll comment on the upcoming module system below).

The rest of the WORKSPACE is more or less the same, we’re calling some Starlark functions and loading some more external dependencies to our project. We’ll come back to these later.

The BUILD file

Let’s look at the BUILD file within the root package.

load("@rules_python//python:defs.bzl", "py_binary")

cc_binary(
    name = "hello",
    srcs = ["hello.cc"],
)

py_binary(
  name = "main",
  srcs = ["main.py"],
)

We’re defining our first build targets here! Let’s first look at the cc_binary, our little C++ program. Quick note, the cc_* rules do not differentiate between C and C++, it’s all lumped together in the same basket.

So the thing here that looks like a call to a Python function cc_binary means we’re creating a target hello, by using a rule called cc_binary. As we said above, this means we have a node in the build graph. This node depends only on the existence of a file called hello.cc. Pay attention to how we loaded py_binary, but not cc_binary. I decided to just use the built-in definition of cc_binary for this example, so no need to load, it’s simply globally available.

Let’s build this target and see what happens exactly.

Note: from now on, I will use the Bazelisk wrapper within the GitHub repo to do my Bazel builds and I’m doing it on my Mac, thus I’m using the Darwin/ARM Bazelisk. If you have installed Bazel on your machine, then just use bazel instead, or whatever you want to use to drive Bazel.

./bazelisk-darwin-arm64 build //:hello

The output I get is the following:

INFO: Analyzed target //:hello (39 packages loaded, 234 targets configured).
INFO: Found 1 target...
Target //:hello up-to-date:
  bazel-bin/hello
INFO: Elapsed time: 0.340s, Critical Path: 0.09s
INFO: 8 processes: 6 internal, 2 darwin-sandbox.
INFO: Build completed successfully, 8 total actions

It seems like it generated a single file called hello within the bazel-bin directory. Let’s peek by running:

file bazel-bin/hello

The output is:

bazel-bin/hello: Mach-O 64-bit executable arm64

Perfect, this is what we wanted, a single binary for a Mac machine (since I’m building on a Mac). Let’s run it!

Running binaries built with Bazel

As we see above, we got a file in bazel-bin called hello. We could go ahead and just execute it.

I got the output:

Hello world

Just as expected.

However, I wouldn’t recommend running binaries inside the codebase that way. What I would suggest instead of this is to use the run command instead. What this will do is it will create a sandbox directory, put all the file dependencies in the right places, put your binary where it should be and then execute it. This is not relevant for this particular example, and I will not bother you with these details right now. At the moment, just keep in mind that this is important when you have runfile dependencies. Don’t worry about it for the rest of this guide. Just remember that it’s a good idea to run things with run, for example:

./bazelisk-darwin-arm64 run //:hello

This gives us the same output. So in this case, it doesn’t matter how we run the binary since there are no other file dependencies (and ideally there shouldn’t be). We had a single artifact out of your build and that’s exactly what we run.

Let’s try the Python binary:

./bazelisk-darwin-arm64 run //:main

Output is:

INFO: Analyzed target //:main (3 packages loaded, 218 targets configured).
INFO: Found 1 target...
Target //:main up-to-date:
  bazel-bin/main
INFO: Elapsed time: 0.134s, Critical Path: 0.01s
INFO: 5 processes: 5 internal.
INFO: Build completed successfully, 5 total actions
INFO: Running command line: bazel-bin/main
Hello world from Python!

Great, we have a uniform way to fire up any executable target, we do it with a run command. Though wait, what is this bazel-bin/main that we ran? What kind of binary could there be, this is Python, we can just interpret, right?

file bazel-bin/main
bazel-bin/main: Python script text executable, ASCII text

If you cat this file, you’ll see a wall of text. What the Python rules did here they generated a wrapper script that will start up your Python script in some particular way, though for you it doesn’t matter in this case. Now when something like this happens, there may be a difference whether you use the run command or you execute directly whatever you see in the build output. The former will arrange the directories neatly for you before running, the latter won’t necessary do it. And when I say arrange directories neatly, I mean put your wrapper where it should be and everything it depends on in the right spots.

You may wonder then how do you distribute the software to your prod environment? Well, if you really have those file dependencies like you have here (though implicitly), you have to replicate that in your production environment. However, the ideal recommendation is to simply have a single executable artifact that you can trivially copy to your deployment. We’ll do an example for that with a simple Go server.

Library dependencies

Let’s look at a slightly more complicated example from the same package.

cc_library(
    name = "say",
    srcs = ["say.cc"],
    hdrs = ["say.h"],
)

cc_binary(
    name = "do_say",
    srcs = ["do_say.cc"],
    deps = [
      ":say",
    ],
)

Here we broke our C++ program into a library and the main binary. Let’s first build the library.

./bazelisk-darwin-arm64 build //:say
INFO: Analyzed target //:say (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:say up-to-date:
  bazel-bin/libsay.a
  bazel-bin/libsay.dylib
INFO: Elapsed time: 0.160s, Critical Path: 0.06s
INFO: 3 processes: 2 internal, 1 darwin-sandbox.
INFO: Build completed successfully, 3 total actions

We have two output artifacts here, an .a file and a .dylib file. Since I’m building on a Mac, I got the .dylib, but if I built on Linux, I’d probably get an .so file. The names have also been generated based on the name of the target say, so we have libsay.a, for example. Basically what we’re seeing here is the following:

  1. The cc_library target depended on the files say.cc and say.h.
  2. Upon building, the target provided outputs libsay.a and libsay.dylib in the build graph. The exact mechanics how it does this is through a system called providers which we’ll briefly explore later.

Now, we can have a C++ binary that depends on this library, and the dependency is expressed through the deps attribute. All the parameters you see when instantiating a rule target are called attributes.

If you want to see what individual actions happened when building the do_say binary, you have to add -s flag to your build. Let’s first clean the cache so we can start from scratch:

./bazelisk-darwin-arm64 clean

The -s flag stands for something like subcommands. Let’s run:

./bazelisk-darwin-arm64 build -s //:say

The output is quite thick so I won’t be copying it here, but it shows individual actions taken.

How do Bazel rules work and writing a custom rule to generate graphics

So far we have used someone else’s rules to get things done, be it through a built-in rule like cc_binary or through a downloaded rule, like py_binary. What if we want to define our own nodes in the build graph, that do something custom? In this example, let’s define a rule that will run a Python program that generates a PNG graphic file. We’ll use the Pillow library from pip to generate the graphics. In the spirit of Bazel and running everything in one command, we will ensure that this remote dependency on Pillow is fetched dynamically behind the scenes during the build process and so we don’t burden the user with the exact mechanics.

Python program for generating graphics

Let’s now go back to the WORKSPACE file briefly.

load("@rules_python//python:pip.bzl", "pip_parse")

pip_parse(name = "pip_deps", requirements_lock = "//:requirements.txt")

load("@pip_deps//:requirements.bzl", "install_deps")

install_deps()

This should ring some bells with the Python developers. There are mentions of pip and the requirements.txt file. The latter describes what pip libraries our Python project depends on. Let’s look at what’s inside it:

Pillow==10.0.1

This basically says we depend on the Pillow library, at version 10.0.1. Pinning the exact version seems like a good idea as it ensures that if someone else builds the project, they get the same version of the library and things are fully reproducible.

So our WORKSPACE snippet from above ensures that whatever is listed in requirements.txt is downloaded during the build flow and made available to our Python code in the project. Let’s see how that is used by peeking at the //generator package, so let’s open the generator/BUILD file. However, before going there, focus for a second on the pip_parse line and keep it in mind as you go forward. generator/BUILD file is below:

load("@pip_deps//:requirements.bzl", "requirement")
load("@rules_python//python:defs.bzl", "py_binary")

py_binary(
  name = "pillow_generator",
  srcs = ["pillow_generator.py"],
  deps = [
    requirement("Pillow"),
  ],
  visibility = [
    "//visibility:public",
  ],
)

What we see here is that in the WORKSPACE file we ended up creating another virtual Bazel project called pip_deps, through the pip_parse function. As I mentioned earlier, only WORKSPACE file can do certain things, and creating these workspaces is one of them. Great, so that virtual project gave us something further called requirement. How the Python binary is formed below is pretty much the same as the sample binary we had in our root package. The only difference here is that we’re depending on a library, just like the C++ program did. However, we’re depending on something we fetched remotely from pip here. This is what the requirement function gave us, a way to reference that.

Each mechanism for referecing remote packages in different languages may give you something slightly different. What you see with pip here may be a little different from what Maven would give you for Java.

Important: Building this binary won’t install Pillow with pip on your system. Instead, it will just use pip to download the necessary files and place them in the working directory of your build. When you do your run command, this is neatly placed inside the working directory, hence another reason why you should do run instead of calling the wrapper yourself manually. And again, you may wonder how to deploy this in production if things are this way — welcome to the pains of deploying Python! We’ll explore later how it’s easier with Go and I’ll leave it at that here. :) And another bonus point here, I don’t think you need to have pip installed on your system at all to be able to build. The Python rules we’re using should fetch the pip system dynamically too, though I can’t verify that at the moment on my Mac since I do have pip installed here. As an exercise, clone this Git repo on a brand new VM and see if what I said is true! :D At worst, you just need pip installed for all this to work.

Another new thing we see here is visibility. By default, your targets are visible only within the same package. For example, if you have a py_library here, only other py_libraries and py_binaries from the same directory (package, to be more accurate) can access it, unless you tweak the visibility. We made it public here because something else will depend on this program. To quickly outline, you’ll probably ever do one of the next few things for visibility (there are more possibilities, I’m keeping it simple here):

  1. Leave it at default, and just reference the target from the same package.
  2. Make it public like I did above. Anyone can reference it, you don’t care about it.
  3. You want only the targets from the package //foo/bar to access it. In that case, set the visibility to visibility = ["//foo/bar:__pkg__"].
  4. If you want the subpackages to also be able to reference the target, don’t use __pkg__ like above, but __subpackages__ instead.

Let’s look at the pillow_generator.py file to see what this tool does:

import argparse

from PIL import Image, ImageDraw

def _parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--image',
                        type=str,
                        dest='image',
                        help='Path to where to make the output file')
    parser.add_argument('--message',
                        type=str,
                        dest='message',
                        help='Message to include in the image')
    return parser.parse_args()


if __name__ == '__main__':
    args = _parse_args()

    img = Image.new('RGB', (len(args.message) * 10, 30), color=(0, 0, 0))
    d = ImageDraw.Draw(img)
    d.text((15,10), args.message, fill=(255,255,255))

    img.save(args.image)

This is a super simple program. It takes 2 command line parameters, one is the path to the output image, and one is the message we want in the image. It will create a PNG file at the specified path which has the text from the --message argument written over it. Feel free to run those commands with some parameters to explore.

When doing run with Bazel, though, make sure you add -- between bazel run //package/foo:target and your arguments. Something like:

bazel run //package/foo:target -- --arg1=value1 --arg2=valu2

This is a general Unix command line thing.

Custom rules for generating images

We’ve seen before how things like cc_library can produce .a files. Let’s now write our own rule that generates a .png file.

As a reminder, a rule defines what happens when a target is on the requested path in the build graph. Before we dig in, let’s zoom in a little on the phases of the build process in Bazel. This is a very cool concept that’s officially explained here. Basically, there are 3 phases:

  1. Loading
  2. Analysis
  3. Execution

In the loading phase, we discover what are the targets from the build graph that we need for our build. E.g. we’re building a cc_binary, we need to discover all the cc_library deps on the way, and maybe even transitive dependencies they may have further. This is also the phase where macros are evaluated. Macros are nothing but Python-like functions that automate instantiation of the targets in some way. For example, if we need, for whatever reason, to create Python library targets like lib1, lib2, lib3, etc., instead of calling py_library 3 times, we can write a function that will do this in a for loop for us. We’ll see macros in action later.

In the analysis phase, we discover actions. The rule implementations are executed, which are again, just some Python-like Starlark functions, and what they do is they produce a bunch of Python-like objects of what you can think is an Action class. These objects define concrete steps that need to happen. Things like “invoke the C++ compiler to compile this file”, “link those files together into a binary”, etc. Please note that at this point, those actions are not executed yet. They are just recorded as something that needs to happen.

Finally, in the execution phase, the actions we discovered above are run and we get the final files that we need.

With this knowledge, let’s head over to the //build_defs package from the sample project. The goal is to be able to do something like this:

pillow_image(
    name = "foo",
    message = "This is foo!",
)

If we have this in a BUILD file, we want to be able to build this target and have a file called foo.png generated which has This is foo! rendered in there. Let’s make this happen and so let’s look at build_defs/images.bzl. We’ll start from the middle:

_pillow_image = rule(
    implementation = _pillow_image_impl,
    attrs = {
        "message": attr.string(
            doc = "Message for the generated image",
            mandatory = True,
        ),
        "out": attr.output(
            doc = "Output label for the generated PNG (must end with .png)",
            mandatory = True,
        ),
        "generator": attr.label(
            default = Label("//generator:pillow_generator"),
            allow_files = True,
            executable = True,
            cfg = "exec",
        ),
    },
)

We’ve just defined a new rule called _pillow_image. Leading underscore means this is private for now. We won’t be exposing that just yet as our API for defining the image rule.

This rule has 3 defined attributes:

  1. Message: This is just a simple string that will be rendered in the PNG image. We’ll pass this string to the tool we wrote above.
  2. Out: We’re pre-declaring an output file here. There are also outputs in Bazel that are not pre-declared. The difference is with pre-declared outputs, they are known at the end of the loading phase. Otherwise, they are known at the end of the analysis phase. I won’t burden you with details here, but let’s say that if something else is depending on what we’re generating, it’s a good idea to pre-declare it.
  3. Generator: This is a label specifying an executable target. By default, this is pointing to the Python image generator that we wrote above. The other parameters here don’t matter at the moment.

These are not all the attributes of this rule, however. There are some attributes that exist by default, such as name, visibility, etc.

Now, let’s scroll down below and look at the pillow_image macro that we’ll be exposing instead of the rule. The macro is just a shorthand for instantiating the rule. The shorthand being generating the value of the out attribute automatically. Namely, the only way I want the out attribute to be used is to simply append .png to the name. That’s really all this macro does:

def pillow_image(name, **kwargs):
    output_label = "%s.png" % name

    _pillow_image(
        name = name,
        out = output_label,
        **kwargs)

You’ll see this often. The rules have some attributes that may be tedious to write, so the rule authors expose a macro instead that will do some automation on top. Really nothing fancy here.

Now let’s get to the core of this rule: the implementation.

def _pillow_image_impl(ctx):
    output_image = ctx.outputs.out
    generator_args = ctx.actions.args()

    generator_args.add("--image", output_image.path)
    generator_args.add("--message", ctx.attr.message)

    ctx.actions.run(
        outputs = [output_image],
        arguments = [generator_args],
        executable = ctx.executable.generator,
    )

    return [
        DefaultInfo(files = depset([output_image]))
    ]

We first figure out what is the pre-declared output that we’re generating. That’s trivially available from the context object. Next, we’re going to declare the arguments for the Python tool we have written before, and we’ll add their values. Finally, we’re invoking the generator tool. This is where we create the action for running the tool, and this tool will be invoked in the final execution phase.

The return value for this function is a list of providers. I won’t get into details, but this is just a bunch of structs that are passed between the rules in the analysis phase. For example, if you have an attribute for your rule that is label to some other rule, you can use ctx.attrs.my_dependency to access the info provided by that rule. This is how, for example, the cc_binary knows about the .a file produced by the cc_library target. cc_library implementation returns a provider that gives us the info about that file. You can define your custom providers that can return really anything. They can return pointers to the generated files, they can return numbers, they can return just about anything Starlark has. In this case, we’ll simply return a DefaultInfo provider and no other providers. Remember how when we run the build command we see some output files? This is how Bazel knows which files have been built, through the DefaultInfo provider.

Using the custom rules and generating a server that can host Bazel-generated images

Let’s see those custom rules in action. We’ll switch our attention to the //server package and so let’s open server/BUILD.

load("//build_defs:images.bzl", "pillow_image")

This is straightforward, we’re loading our newly created pillow_image macro. However, we still have the mystery of why are we referring to our file images.bzl as a Bazel target, when it’s just a plain file? It’s because files are also targets in Bazel. They’re basically implicit targets that don’t have any dependencies. You can think of them as leafs in our build graph. Therefore our build graph always consists of targets, but our leafs are special kinds of targets, the ones that don’t require any building at all. It’s quite a smart concept.

pillow_image(
    name = "hello_world",
    message = "Hello world Bazel!",
)

pillow_image(
    name = "foo",
    message = "This is foo!",
)

pillow_image(
    name = "bar",
    message = "This is bar!",
)

[
    pillow_image(
        name = word,
        message = word + word + word,
    )

    for word in ['lorem', 'ipsum']
]

Let’s go ahead and build some of these.

./bazelisk-darwin-arm64 build //server:foo

The output is here:

INFO: Analyzed target //server:foo (6 packages loaded, 344 targets configured).
INFO: Found 1 target...
Target //server:foo up-to-date:
  bazel-bin/server/foo.png
INFO: Elapsed time: 0.925s, Critical Path: 0.76s
INFO: 6 processes: 5 internal, 1 darwin-sandbox.
INFO: Build completed successfully, 6 total actions

Go ahead and open up the foo.png file. It should be a little image with black background and white text that says “This is foo!“. For fun, I also added a little list comprehension that instantiates 2 more targets. Personally, I think it’s ugly to have logic in a BUILD file, so if I were to do something like this for real, I’d factor it out in a macro that goes in a separate .bzl file.

Great, but let’s not just stop here, let’s do one more thing before wrapping up this guide. Let’s make an easily-deployable Go server that can actually host these images! Back to our WORKSPACE file quickly:

http_archive(
    name = "io_bazel_rules_go",
    sha256 = "91585017debb61982f7054c9688857a2ad1fd823fc3f9cb05048b0025c47d023",
    urls = [
        "https://mirror.bazel.build/github.com/bazelbuild/rules_go/releases/download/v0.42.0/rules_go-v0.42.0.zip",
        "https://github.com/bazelbuild/rules_go/releases/download/v0.42.0/rules_go-v0.42.0.zip",
    ],
)

load("@io_bazel_rules_go//go:deps.bzl", "go_register_toolchains", "go_rules_dependencies")

go_rules_dependencies()

go_register_toolchains(version = "1.21.1")

First part should be clear by now, we’re downloading the Go rules definition for Bazel and then we instantiate the dependencies (I still owe you a bit of explanation on it, hold tight). Finally, what’s really cool here is we’re registering a toolchain for Go. What this does under the hood is it enables us to not even have Go toolchain installed to make this build. These rules do not depend on the Go toolchain installed on your system, and even if you have it, the rules won’t use them. These rules are battery-included, meaning that they’ll transparently fetch the Go compiler and everything needed to build Go programs. We’re also able to lock in on a certain version for tight reproducibility.

Let’s go back to server/BUILD now. The top of the file has:

load("@io_bazel_rules_go//go:def.bzl", "go_binary")

This is very clear now. Let’s create a binary that will serve our generated image:

go_binary(
    name = "server",
    srcs = ["server.go"],
    embedsrcs = [
        ":hello_world.png",
        ":foo.png",
        ":bar.png",
        ":lorem.png",
        ":ipsum.png",
    ],
)

Let’s build this target and see what we get:

INFO: Analyzed target //server:server (13 packages loaded, 9810 targets configured).
INFO: Found 1 target...
INFO: From GoLink server/server_/server:
ld: warning: ignoring duplicate libraries: '-lm'
Target //server:server up-to-date:
  bazel-bin/server/server_/server
INFO: Elapsed time: 15.419s, Critical Path: 14.39s
INFO: 16 processes: 7 internal, 9 darwin-sandbox.
INFO: Build completed successfully, 16 total actions

Fun note: //server:server can be shortened to just //server, since the target name can be inferred if it’s the same as the name of the package.

Let’s see what this file is:

file bazel-bin/server/server_/server

Output:

bazel-bin/server/server_/server: Mach-O 64-bit executable arm64

This is what we need and this one is juicy! embedsrcs is a very interesting thing. It enables us to literally embed some bytes into the binary itself. This binary doubles as an archive of files and the executable binary. I really like this feature in Go, because if I need some files to go along with my binary, like the images we want to serve over HTTP here, we end up with a single deployment file. Everything is tightly statically linked and the archive is baked into the binary itself. This file from blaze-bin is what I need to deploy, nothing else! With Python, I have to ensure that the deps I used from pip are also deployed side by side with the main logic, I need to make sure that all the py files are in the right directory… it’s just a pain. This is way simpler! Let’s take a look at the code itself:

package main

import (
	"embed"
	"log"
	"net/http"
)

var (
	//go:embed hello_world.png foo.png bar.png lorem.png ipsum.png
	images embed.FS
)

func main() {
	staticHandler := http.FileServer(http.FS(images))
	err := http.ListenAndServe(":3333", staticHandler)

	if err != nil {
		log.Fatalf("Problem launching the server: %v", err)
	}
}

It should be mostly self explanatory at this point. Another shoutout for the embed feature which lets us even create this awesome embed.FS object by simply adding the pseudo-comment above.

Feel free to run this server:

./bazelisk-darwin-arm64 run //server

Now go ahead and visit http://localhost:3333 and click around a bit. Do you notice the magic that happened? All you had to do is do a single run command, and everything happened transparently in one go. This means that the pip dependencies were fetched, your Python program was put together, executed a couple of times to generate some images, then these images were collected and baked into your binary compiled by a Go toolchain that was downloaded on the go (see what I did here??) without you ever having to even worry about having Go installed on your machine. You just hit that one command, wait a little bit and you were ready to go!

This is how engineers at Google build and run tools, servers, etc. Everything is uniform with the build and run commands (test is also possible for running tests, but we won’t do tests in this guide, I am sure you can easily figure them out at this point), and all the complex stuff runs very magically. As a reminder, you can always run with the -s flag to see the details of what happened in your build.

The final catch: :image.png vs :image

Did you notice how we depended on the :foo.png, rather than just foo? We previously pre-declared the output, which among other things, means that Bazel will create a new label for the output file that was generated on the fly. From that point, we can use this file just like any other file.

Note that we said before even the real files like server.go are actually targets. That’s true, nothing changed, and what I’m trying to say here is you can refer to that file too as :server.go, with a leading colon. It’s just a rare thing to do and actually, if you leave out a collon, it makes it clear that it’s a file, rather than a target. I personally prefer to add the colon for generated files to make sure it’s clear these are generated.

Anyway, this is a bit of a convention thing. I hope you got the general concept.

Hermeticity and vendoring

If you read about Bazel and Google’s building practices, you’ll hear a lot about hermeticity. Fully hermetic builds are the builds that generate on nothing but the files in the codebase. If you want to take this to an extreme, a fully fully fully hermetic build will even build the compiler along the way, or use a pre-built one that’s checked into the version control system. Our cc_binary rules are probably not fully fully fully hermetic, and they likely invoke something like /usr/bin/gcc. I know it may seem to some that it is excessive someone would sweat about that, but it’s a real concern. Imagine the computing system in your vehicle: you wouldn’t want it to be the case that just because your car company built the software on a different machine that the code for your cruise control system is different. You would certainly expect them to know every byte of that software — your life may depend on it. Therefore, there are cases where you want to take this hermeticity principle to an extreme.

Less scary than the example above, but imagine if the HTTP endpoint serving the Go rules went down. The WORKSPACE file has 2 mirrors listed, but they could both be down. And what if you need to run the build right at that moment, it’s mission critical? In that case, you may want to consider checking in the code for these dependencies right into your codebase (vendor them). You don’t want to risk some server outside of your control being down in order for your build to work.

The Bazel and Google philosophy is in general to make things as hermetic as possible. Obviously, if you’re a single engineer on an open source project, you don’t have all the resources to spend time vendoring the code in. Sometimes it’s more complicated than just checking in the code. For example, what if some tool you want to invoke in your build flow is not built with Bazel? You have to now set up your build flow to deal with this as well. It may be easier to just reference the pre-built binary dependency via HTTP in your WORKSPACE file and call it a day. You can do either with Bazel, just do what’s easier for you and what makes more sense.

Remote execution

Let’s remind ourselves of the Wix usecase for Bazel. I’ll quote the page here:

Wix is a cloud-based web development platform. Their backend uses Java and Scala code. They use remote execution with Google Cloud Build.

If you recall the 3 different phases of build, the last one is the execution phase, where all the magic really happens such as invoking the compiler, linker, etc. Bazel supports offloading those actions to a remote build server. There’s a specific protocol behind it, and the idea is that you should simply point your Bazel build to a server endpoint and it will transparently handle the offloading of tasks onto the remote server. If you’re running massive builds, one of the solutions for improving the speed could be running a cluster of build workers in Google Cloud and using them as the backend for your Bazel builds via remote execution.

Some additional info

The upcoming module system

If you remember how we set up transitive remote dependencies in our project, it was a bit painful. We had to explicitly call a function from our direct dependency to set up the transitive ones. That’s clanky.

Bazel has thus come up with the module system. I think it’s still alpha or beta, so I won’t go into the details here, but it’s an up and coming way to do things.

Cross compiling

As I said in the title: you can build all software with Bazel. This includes your apps for Android, iOS, etc. However, that is building for another platform, a process known as cross compiling. Please stay tuned for the follow up articles on this, I will go back to my whole embedded story and cover this.

Below are some of the recommended exercises for the reader to do independently.

Try experimenting with writing custom rules that depend on other rules. Use providers as suggested above to pass useful information from the dependency upwards.

Bonus points if you get into depsets.

A fun exercise that you can do here is write a rule that will generate code in some language, e.g. Java, and then use that generated code as a source file for a binary or a library. You can have a lot of fun playing with the JavaPoet library to dynamically generate some Java code and use those generated sources in the Java binary. It’s a great experience.

I mentioned above that sometimes we have dependencies on something that is not built with Bazel. Gazelle can help us here. It’s a tool that can turn a “normal” workspace into a Bazelized one. Try playing with it on a Go project not built with Bazel and see what it generates.

Extra points if you make a dependency on a different GitHub repository of, say, a Go project and use Gazelle as a part of your build flow to transparently make your dependency a Bazel project and then use that as a dep to your Bazel flow.

Conclusion

You should now have a very good idea of how to use Bazel as an advanced user and you should be able to build like a Google engineer. This was a lengthy guide, but I hope worth your attention. I would like to finish with a few more thoughts.

First, the user didn’t need to have pretty much anything installed in order to build things in the project. My suspicion is that there are only 2 non-hermetic dependencies here:

  1. We needed something to build C++, like GCC. It had to come from somewhere since we didn’t check in the GCC tools into the Git repo directly.
  2. Python is needed to run the Python tool, though I’m not sure if maybe Python rules for Bazel are able to provide that runtime as well for the run commands, it’s certainly possible.

Other than that, there was really nothing we needed the user of our project to have installed. The Go toolchain was obtained on the fly, as well as Bazel itself. We vendored in the Bazelisk wrapper that was executable just like that, and it fetched Bazel for us dynamically.

Second, we never checked in anything generated into the codebase. We made those generated images sort of appear to the rest of your build flow as real PNG files through pre-declared outputs, but they were really generated. This is generally the approach projects take with Bazel: don’t check in generated files, generate them on the fly, and rely on the Bazel cache to ensure things aren’t built redundantly when nothing changes.

Latly, even though Bazel is advertised as fully reproducible and deterministic, you can probably identify a ton of spots where this can be hurt by the project itself. The dependencies coming from the HTTP endpoints can change unless we use SHA-256 hashes, pip may decide to give us something different each time, our internal tools for the build may be non-deterministic in nature, and so on. This is not Bazel’s fault, and it indeed is very much deterministic and its builds are reproducible, but what you’re building can easily introduce some randomness accidentally. Therefore, if you end up with something slightly different byte-for-byte while building the sample project, it’s very likely my fault. :)

I hope this was useful!

Please consider following on Twitter and LinkedIn to stay updated.