What is bpftrace?

bpftrace is both a CLI tool and a tracing language that compiles down to Linux enhanced Berkeley Packet Filter (eBPF) instructions. The BPF VM subsystem in the Linux kernel is immensely powerful. Covering it would require a separate article. For now, think of eBPF as some kind of mechanism that magically enables sandboxed loading of injected logic into priveleged contexts from userspace. bpftrace makes use of eBPF to inject dynamic tracing instruments. The bpftrace language can work with tranditional static tracing probes as well.

Different types of probes available

The different types of probes are demonstrated in the bpftrace reference guide in the source tree. As I continue to learn more about bpftrace and think more details should be expanded about each probe type, I will add that information to this post. For now, I will cover what kernel build configuration options are required for supporting these probes.

kprobe / kretprobe

Minimal configuration required.

  CONFIG_BPF_EVENTS=y
  CONFIG_KPROBES=y

Additional helpful configuration options.

  CONFIG_BPF_KPROBE_OVERRIDE=y # Enables overriding functions that would be executed after kprobe point
  CONFIG_KPROBES_ON_FTRACE=y   # Optimizes kprobes with ftrace tracers already generated
  CONFIG_KPROBE_EVENTS=y       # Support dynamically inserting tracing events using kprobes

kfunc / kretfunc

Minimal configuration required.

  CONFIG_DEBUG_INFO_BTF=y

Additional helpful configuration options.

  CONFIG_MODULE_ALLOW_BTF_MISMATCH=y # Allows modules with mismatching BTF information against running kernel to be loaded

uprobe / uretprobe

  CONFIG_UPROBE=y

Additional helpful configuration options.

  CONFIG_UPROBE_EVENTS=y # Support dynamically setting uprobes/uretprobes using memory offsets of userspace programs
                         # Link: https://docs.kernel.org/trace/uprobetracer.html

tracepoint

Minimal configuration required.

  CONFIG_TRACEPOINTS=y

Additional helpful configuration options.

  CONFIG_TRACEPOINT_BENCHMARK=y # For benchmarking the tracepoint feature in the kernel using a kernel tracepoint

Here is documentation for how to implement tracepoints in kernel code. Even without bpftrace, there are mechanisms such as event tracing that can be used to handle tracepoint activity.

Knowing what's supported on your kernel

Running bpftrace --info provides information on what is and is not supported.

bpftrace --info
System
  OS: Linux 6.1.31 #1-NixOS SMP PREEMPT_DYNAMIC Tue May 30 13:03:33 UTC 2023
  Arch: x86_64

Build
  version: v0.18.0
  LLVM: 14.0.6
  unsafe probe: no
  bfd: yes
  libdw (DWARF support): yes

Kernel helpers
  probe_read: yes
  probe_read_str: yes
  probe_read_user: yes
  probe_read_user_str: yes
  probe_read_kernel: yes
  probe_read_kernel_str: yes
  get_current_cgroup_id: yes
  send_signal: yes
  override_return: no
  get_boot_ns: yes
  dpath: yes
  skboutput: yes

Kernel features
  Instruction limit: 1000000
  Loop support: yes
  btf: yes
  module btf: yes
  map batch: yes
  uprobe refcount (depends on Build:bcc bpf_attach_uprobe refcount): yes

Map types
  hash: yes
  percpu hash: yes
  array: yes
  percpu array: yes
  stack_trace: yes
  perf_event_array: yes

Probe types
  kprobe: yes
  tracepoint: yes
  perf_event: yes
  kfunc: yes
  iter:task: yes
  iter:task_file: yes
  iter:task_vma: yes
  kprobe_multi: no
  raw_tp_special: yes

A lot of the kernel dependent features will require certain configuration options to be selected. The output shared is the default for the build configuration used in NixOS for the linuxPackages_latest kernel. Convenient for me in general for demonstrations. However, I need to compile the kernel myself for development purposes. I present the needed configuration options for each type of probe.

Useful resources for learning more about bpftrace

Honestly, I am pretty new to both bpftrace and eBPF myself. I plan on updating this page as I continue to learn more. One of my goals is learning how to use the stackcollapse-bpftrace.pl script for generating flamegraphs. Right now, I use perf for generating flamegraphs. I am also collecting useful bpftace snippets that I build along my journey as a kernel developer and systems enthusiast. These snippets can be found on my GitHub repository, Binary-Eater/bpftrace-scripts.

In general, the iovisor/bpftrace GitHub repository has a nice reference and one-liner tutorial for new users to follow along. The manpage is an even more thorough resource.

The tools/ directory of the bpftrace repository also serves as a great reference.

Brendan Gregg's blog has a number of additional examples as well.