Skip to content

Commit

Permalink
Setup "debugging" and misc cleanup (sched-ext#695)
Browse files Browse the repository at this point in the history
* Fix a couple of misc errors in build scripts.
* Tweak scripts/kconfigs to make bpftrace work.
* Update how CI caching works to make builds faster (6 minute turnaround
  time)
* Update CI config to generate per-scheduler debug archives w/ guest
  dmesg/scheduler stdout, guest stdout, bpftrace script output,
  veristat output.

* Update build scripts to accept the following:
** VNG RW -- write to host filesystem (better caching, logging).
* For stress tests in particular (via ini config):
** QEMU Opts -- to facilitate reproducing bugs (i.e. high core count).
** bpftrace scripts -- specify bpftrace scripts to run during stress
tests.
  • Loading branch information
likewhatevs authored Sep 26, 2024
1 parent 818e829 commit bf68679
Show file tree
Hide file tree
Showing 9 changed files with 209 additions and 83 deletions.
30 changes: 21 additions & 9 deletions .github/actions/install-deps-action/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,35 @@ runs:
using: 'composite'
steps:
### OTHER REPOS ####

# Hard turn-off interactive mode
- run: echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
shell: bash

# Refresh packages list
- run: sudo apt update
# turn off interactive, refresh pkgs, use apt fast
- run: |
echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
sudo rm /var/lib/man-db/auto-update
echo "deb [signed-by=/etc/apt/keyrings/apt-fast.gpg] http://ppa.launchpad.net/apt-fast/stable/ubuntu $(source /etc/os-release && echo $UBUNTU_CODENAME) main" | sudo tee /etc/apt/sources.list.d/apt-fast.list
wget -q -O- "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0xBC5934FD3DEBD4DAEA544F791E2824A7F22B44BD" | sudo gpg --dearmor -o /etc/apt/keyrings/apt-fast.gpg
sudo apt-get update -y
sudo apt-get install -y apt-fast aria2 tasksel
echo 'debconf apt-fast/maxdownloads string 100' | sudo debconf-set-selections
echo 'debconf apt-fast/dlflag boolean true' | sudo debconf-set-selections
echo 'debconf apt-fast/aptmanager string apt-get' | sudo debconf-set-selections
sudo tasksel remove ubuntu-desktop
shell: bash
### DOWNLOAD AND INSTALL DEPENDENCIES ###

# Download dependencies packaged by Ubuntu
- run: sudo apt -y install bison busybox-static cargo cmake coreutils cpio elfutils file flex gcc gcc-multilib git iproute2 jq kbd kmod libcap-dev libelf-dev libunwind-dev libvirt-clients libzstd-dev linux-headers-generic linux-tools-common linux-tools-generic make ninja-build pahole pkg-config python3-dev python3-pip python3-requests qemu-kvm rsync rustc stress-ng udev zstd libseccomp-dev libcap-ng-dev llvm clang python3-full pipx curl meson
- run: |
sudo apt-fast install -f -y bison busybox-static cmake coreutils \
cpio elfutils file flex gcc gcc-multilib git iproute2 jq kbd kmod \
libcap-dev libelf-dev libunwind-dev libvirt-clients libzstd-dev \
linux-headers-generic linux-tools-common linux-tools-generic make \
ninja-build pahole pkg-config python3-dev python3-pip python3-requests \
qemu-kvm rsync stress-ng udev zstd libseccomp-dev libcap-ng-dev \
llvm clang python3-full curl meson bpftrace cargo rustc dwarves
shell: bash
# virtme-ng
- run: pip3 install virtme-ng --break-system-packages
- run: sudo pip3 install virtme-ng --break-system-packages
shell: bash

# Setup KVM support
Expand Down
107 changes: 64 additions & 43 deletions .github/workflows/caching-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:
- cron: "0 * * * *"
push:
pull_request:

jobs:
lint:
runs-on: ubuntu-24.04
Expand All @@ -32,8 +32,8 @@ jobs:
- run: sudo chown root /usr/bin/tar && sudo chmod u+s /usr/bin/tar
# redundancy to exit fast
- run: echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
- run: sudo apt update
- run: sudo apt install -y git --no-install-recommends
- run: sudo apt-get update
- run: sudo apt-get install -y git --no-install-recommends
# get latest head commit of sched_ext for-next
- run: echo "SCHED_EXT_KERNEL_COMMIT=$(git ls-remote https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git heads/for-next | awk '{print $1}')" >> $GITHUB_ENV

Expand All @@ -45,8 +45,10 @@ jobs:
uses: actions/cache@v4
with:
path: |
linux
key: kernel-build-${{ env.SCHED_EXT_KERNEL_COMMIT }}
linux/arch/x86/boot/bzImage
linux/usr/include
linux/**/*.h
key: kernel-build-${{ env.SCHED_EXT_KERNEL_COMMIT }}-4

- if: ${{ steps.cache-kernel.outputs.cache-hit != 'true' }}
uses: ./.github/actions/install-deps-action
Expand All @@ -62,14 +64,6 @@ jobs:
- if: ${{ steps.cache-virtiofsd.outputs.cache-hit != 'true' && steps.cache-kernel.outputs.cache-hit != 'true' }}
run: cargo install virtiofsd && sudo cp -a ~/.cargo/bin/virtiofsd /usr/lib/

# cache bzImage alone for rust tests (disk space limit workaround)
- name: Cache bzImage
id: cache-bzImage
uses: actions/cache@v4
with:
path: |
linux/arch/x86/boot/bzImage
key: kernel-bzImage-${{ env.SCHED_EXT_KERNEL_COMMIT }}

- if: ${{ steps.cache-kernel.outputs.cache-hit != 'true' }}
name: Clone Kernel
Expand All @@ -96,14 +90,18 @@ jobs:
integration-test:
runs-on: ubuntu-24.04
needs: build-kernel
continue-on-error: true
strategy:
matrix:
scheduler: [ scx_bpfland, scx_lavd, scx_layered, scx_rlfifo, scx_rustland, scx_rusty ]
fail-fast: false
steps:
# prevent cache permission errors
- run: sudo chown root /usr/bin/tar && sudo chmod u+s /usr/bin/tar
- uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
with:
key: ${{ matrix.scheduler }}
prefix-key: "4"
- uses: ./.github/actions/install-deps-action
# cache virtiofsd (goes away w/ 24.04)
- name: Cache virtiofsd
Expand All @@ -125,8 +123,10 @@ jobs:
uses: actions/cache@v4
with:
path: |
linux
key: kernel-build-${{ env.SCHED_EXT_KERNEL_COMMIT }}
linux/arch/x86/boot/bzImage
linux/usr/include
linux/**/*.h
key: kernel-build-${{ env.SCHED_EXT_KERNEL_COMMIT }}-4

# need to re-run job when kernel head changes between build and test running.
- if: ${{ steps.cache-kernel.outputs.cache-hit != 'true' }}
Expand All @@ -139,7 +139,7 @@ jobs:
- run: sudo chmod +x /usr/bin/veristat && sudo chmod 755 /usr/bin/veristat

# The actual build:
- run: meson setup build -Dkernel=$(pwd)/linux -Dkernel_headers=./linux/usr/include -Denable_stress=true
- run: meson setup build -Dkernel=../linux/arch/x86/boot/bzImage -Dkernel_headers=../linux -Denable_stress=true -Dvng_rw_mount=true
- run: meson compile -C build ${{ matrix.scheduler }}

# Print CPU model before running the tests (this can be useful for
Expand All @@ -148,13 +148,35 @@ jobs:

# Test schedulers
- run: meson compile -C build test_sched_${{ matrix.scheduler }}
# this is where errors we want logs on start occurring, so always generate debug info and save logs
if: always()
# Stress schedulers
- uses: cytopia/shell-command-retry-action@v0.1.2
name: stress test
if: always()
with:
retries: 3
command: meson compile -C build stress_tests_${{ matrix.scheduler }}
- run: meson compile -C build veristat_${{ matrix.scheduler }}
if: always()
- run: sudo cat /var/log/dmesg > host-dmesg.ci.log
if: always()
- run: echo "NICE_REF=${{ github.event.pull_request && github.head_ref || github.ref_name }}" >> $GITHUB_ENV
if: always()
- run: mkdir -p ./log_save/
if: always()
# no symlink following here (to avoid cycles)
- run: sudo find '/home/runner/' -iname '*.ci.log' -exec mv {} ./log_save/ \;
if: always()
- name: upload debug logs, bpftrace, veristat, dmesg, etc.
if: always()
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.scheduler }}_logs_${{ env.NICE_REF }}_${{ github.run_id }}_${{ github.run_attempt }}
path: ./log_save/*.ci.log
# it's all txt files w/ 90 day retention, lets be nice.
compression-level: 9


rust-test-core:
runs-on: ubuntu-24.04
Expand All @@ -166,6 +188,10 @@ jobs:
# prevent cache permission errors
- run: sudo chown root /usr/bin/tar && sudo chmod u+s /usr/bin/tar
- uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
with:
key: ${{ matrix.component }}
prefix-key: "4"
- uses: ./.github/actions/install-deps-action
# cache virtiofsd (goes away w/ 24.04)
- name: Cache virtiofsd
Expand All @@ -180,28 +206,24 @@ jobs:

# get latest head commit of sched_ext for-next
- run: echo "SCHED_EXT_KERNEL_COMMIT=$(git ls-remote https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git heads/for-next | awk '{print $1}')" >> $GITHUB_ENV
# cache bzImage alone for rust tests
- name: Cache bzImage
id: cache-bzImage

- name: Cache Kernel
id: cache-kernel
uses: actions/cache@v4
with:
path: |
linux/arch/x86/boot/bzImage
key: kernel-bzImage-${{ env.SCHED_EXT_KERNEL_COMMIT }}
linux/usr/include
linux/**/*.h
key: kernel-build-${{ env.SCHED_EXT_KERNEL_COMMIT }}-4

# need to re-run job when kernel head changes between build and test running.
- if: ${{ steps.cache-bzImage.outputs.cache-hit != 'true' }}
- if: ${{ steps.cache-kernel.outputs.cache-hit != 'true' }}
name: exit if cache stale
run: exit -1

- uses: Swatinem/rust-cache@v2
with:
workspaces: rust
key: ${{ matrix.component }}
prefix-key: "1"
- run: cargo build --manifest-path rust/${{ matrix.component }}/Cargo.toml
- run: cargo test --manifest-path rust/${{ matrix.component }}/Cargo.toml --no-run
- run: vng -v --memory 10G --cpu 8 -r linux/arch/x86/boot/bzImage --net user -- cargo test --manifest-path rust/${{ matrix.component }}/Cargo.toml
- run: vng -v --rw --memory 10G --cpu 8 -r linux/arch/x86/boot/bzImage --net user -- cargo test --manifest-path rust/${{ matrix.component }}/Cargo.toml

rust-test-schedulers:
runs-on: ubuntu-24.04
Expand All @@ -213,6 +235,10 @@ jobs:
# prevent cache permission errors
- run: sudo chown root /usr/bin/tar && sudo chmod u+s /usr/bin/tar
- uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
with:
key: ${{ matrix.scheduler }}
prefix-key: "4"
- uses: ./.github/actions/install-deps-action
# cache virtiofsd (goes away w/ 24.04)
- name: Cache virtiofsd
Expand All @@ -227,28 +253,24 @@ jobs:

# get latest head commit of sched_ext for-next
- run: echo "SCHED_EXT_KERNEL_COMMIT=$(git ls-remote https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git heads/for-next | awk '{print $1}')" >> $GITHUB_ENV
# cache bzImage alone for rust tests
- name: Cache bzImage
id: cache-bzImage
# Cache Kernel alone for rust tests
- name: Cache Kernel
id: cache-kernel
uses: actions/cache@v4
with:
path: |
linux/arch/x86/boot/bzImage
key: kernel-bzImage-${{ env.SCHED_EXT_KERNEL_COMMIT }}
linux/usr/include
linux/**/*.h
key: kernel-build-${{ env.SCHED_EXT_KERNEL_COMMIT }}-4

# need to re-run job when kernel head changes between build and test running.
- if: ${{ steps.cache-bzImage.outputs.cache-hit != 'true' }}
- if: ${{ steps.cache-kernel.outputs.cache-hit != 'true' }}
name: exit if cache stale
run: exit -1

- uses: Swatinem/rust-cache@v2
with:
workspaces: scheds/rust
key: ${{ matrix.scheduler }}
prefix-key: "1"
- run: cargo build --manifest-path scheds/rust/${{ matrix.scheduler }}/Cargo.toml
- run: cargo test --manifest-path scheds/rust/${{ matrix.scheduler }}/Cargo.toml --no-run
- run: vng -v --memory 10G --cpu 8 -r linux/arch/x86/boot/bzImage --net user -- cargo test --manifest-path scheds/rust/${{ matrix.scheduler }}/Cargo.toml
- run: vng -v --rw --memory 10G --cpu 8 -r linux/arch/x86/boot/bzImage --net user -- cargo test --manifest-path scheds/rust/${{ matrix.scheduler }}/Cargo.toml

pages:
runs-on: ubuntu-24.04
Expand All @@ -270,8 +292,7 @@ jobs:
rustup install nightly
export PATH="~/.cargo/bin:$PATH"
RUSTDOCFLAGS="--enable-index-page -Zunstable-options" ~/.cargo/bin/cargo +nightly doc --workspace --no-deps --bins --lib --examples --document-private-items --all-features
sudo apt update
sudo apt install build-essential graphviz sphinx-doc python3-sphinx-rtd-theme texlive-latex-recommended python3-yaml -y
sudo apt-fast install build-essential graphviz sphinx-doc python3-sphinx-rtd-theme texlive-latex-recommended python3-yaml -y
cargo install htmlq
git clone --single-branch -b for-next --depth 1 https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git linux
cd linux
Expand Down
20 changes: 20 additions & 0 deletions .github/workflows/sched-ext.config
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,23 @@ CONFIG_PREEMPT_RCU=y
#
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y

# Bpftrace headers (for additional debug info)
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_FUNCTION_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_HAVE_KPROBES=y
CONFIG_KPROBES=y
CONFIG_KPROBE_EVENTS=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_UPROBES=y
CONFIG_UPROBE_EVENTS=y
CONFIG_DEBUG_FS=y
# more bpftrace to make that work
CONFIG_IKHEADERS=y
27 changes: 25 additions & 2 deletions meson-scripts/run_stress_tests
Original file line number Diff line number Diff line change
Expand Up @@ -40,21 +40,36 @@ def run_stress_test(
vng_path: str,
kernel: str,
verbose: bool,
rw: bool,
headers: str,
) -> int:
scheduler_args = config.get('scheduler_args')
stress_cmd = config.get('stress_cmd')
s_path = sched_path(build_dir, config.get('sched'))
sched_cmd = s_path + " " + config.get('sched_args')
timeout_sec = int(config.get("timeout_sec"))
if vng_path:
cmd = [vng_path, "--user", "root", "-v", "-r", kernel]
if config.get("qemu_opts"):
cmd += ['--qemu-opts']
cmd += [f"'{config.get("qemu_opts")}'"]
vm_input = f"{stress_cmd} & timeout --foreground --preserve-status {timeout_sec} {sched_cmd}"
if bpftrace_scripts := config.get('bpftrace_scripts'):
vm_input = f"\"{build_dir}/bpftrace_stress_wrapper.sh\" '{stress_cmd}' '{sched_cmd}' '{timeout_sec}' '{bpftrace_scripts}'"
cmd = [vng_path, "--user", "root", "-v", "--", vm_input]
if headers:
vm_input += f" '{headers}'"
if rw and os.getenv('CI'):
print('mounting VNG as RW because CI')
cmd += ["--rw"]
elif rw:
print('not mounting VNG as RW because not CI')
cmd += ["--"]
cmd += [vm_input]
err = sys.stderr if output == "-" else open(output, "w")
out = sys.stdout if output == "-" else err
print(f"vng cmd is {cmd}")
proc = subprocess.Popen(
cmd, env=os.environ, cwd=kernel, shell=False, stdout=out,
cmd, env=os.environ, shell=False, stdout=out,
stderr=err, stdin=subprocess.PIPE, text=True)
proc.wait()
return proc.returncode
Expand Down Expand Up @@ -85,6 +100,8 @@ def stress_tests(args: Namespace) -> None:
vng_path,
args.kernel,
args.verbose,
args.rw,
args.headers
)
for test_name, ret in return_codes.items():
if ret not in (143, 0):
Expand Down Expand Up @@ -114,6 +131,12 @@ if __name__ == "__main__":
parser.add_argument(
'--sched', default='', help='Scheduler to test (default: all)'
)
parser.add_argument(
'--rw', default=False, help='Mount VNG Directories as RW (dangerous)'
)
parser.add_argument(
'--headers', default='', help='Kernel Headers Path'
)

args = parser.parse_args()
if args.verbose:
Expand Down
13 changes: 12 additions & 1 deletion meson-scripts/test_sched
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,16 @@ GUEST_TIMEOUT=60
#
declare -A SCHEDS

VNG_RW=''

# Enable vng rw for when on ci.
if [ $# -ge 3 ] ; then
if [ "$3" == "VNG_RW=true" ]; then
echo 'setting vng to mount rw'
VNG_RW=' --rw '
fi
fi

# enable running tests on individual schedulers
if [ $# -ge 2 ] ; then
SCHEDS[$2]=""
Expand Down Expand Up @@ -63,7 +73,7 @@ for sched in ${!SCHEDS[@]}; do

rm -f /tmp/output
timeout --preserve-status ${GUEST_TIMEOUT} \
vng -m 2G -v -r ${kernel} -- \
vng --user root -m 10G --cpu 8 $VNG_RW -v -r ${kernel} -- \
"timeout --foreground --preserve-status ${TEST_TIMEOUT} ${sched_path} ${args}" \
2> >(tee /tmp/output) </dev/null
grep -v " Speculative Return Stack Overflow" /tmp/output | \
Expand All @@ -79,4 +89,5 @@ for sched in ${!SCHEDS[@]}; do
else
echo "OK: ${sched}"
fi
cp /tmp/output test_log.ci.log
done
Loading

0 comments on commit bf68679

Please sign in to comment.