Klay Beam is a toolkit for running Apache Beam jobs that process audio data. It can be used for parallelizing audio transformations and feature extraction. The basic workflow:
- Write an audio transformation or feature-extraction pipeline
- Run and test the pipeline locally
- Launch a job with the GCP Dataflow runner, scaling up execution to hundreds or thousands of concurrent processes.
This repository bundles:
- The
klay_beam
python package with basic utilities for reading, transforming, and writing audio in beam pipelines - Docker image build processes for images that can be used in pipelines executed on the Dataflow Beam Runner on GCP. Core images include support for a python versions 3.9, 3.10, 3.11, and PyTorch (optionally with CUDA support). (See: hub.docker.com/r/klaymusic/klay-beam/)
- Examples of simple pipeline launch scripts for running jobs locally and on GCP Dataflow.
See the python package readme in klay_beam
for more
information.