Beatdapp is a company delivering the most advanced streaming integrity technology in the world. One of our ventures is building machine learning inference systems for audio at scale. The work spans music, podcasts, and speech, with a particular focus on AI-generated sound. As generative audio gets cheaper and faster, the platforms we serve need accurate signal about what's real in their content, and they depend on us to provide it.
This role sits at the intersection of ML engineering, platform / infrastructure work, and inference systems. You will bridge the gap between raw audio and the clean signals our detection models depend on. You will partner with data scientists on bringing those models into production, carrying the production lens (latency, cost, customer-facing edges) into design conversations early so trade-offs are made together rather than discovered late.
In practice, the work cuts across the GPU-bound inference containers, the multi-cloud infrastructure that runs them, the API layer in front, the data and observability around them, and the CI that ships it all. The architectural challenge running through all of it is containing drift and scaling with minimal code.
Roadmaps here are weeks, not quarters, and your scope grows and shifts as the team and systems do. So we are hiring for engineering judgment first: a strong feel for clean, scalable design, an eye for code hygiene, and the courage to advocate for a better approach rather than default to consensus.
- Container Engineering and Orchestration: Build, tune, and ship our inference containers. Building and maintaining Dockerfile and dependencies, image size and cold-starts, GPU access patterns, the multi-cloud orchestration shape that runs it (ECS, Cloud Run, GKE, EKS), test coverage for the container surface, and the storage abstraction it depends on.
- In-Container Performance and Resource Optimization: Squeeze more out of each GPU instance: concurrency tuning, VRAM accounting, request timeouts and queueing, rate limiting, multi-GPU distribution on instances that have more than one, and the right-sizing decisions that follow.
- Scale and Stress Testing: Build and run scale and stress scenarios across mock deployments that mirror real customer environments. Characterize the latency-vs-throughput curves, find the breaking points, and turn the results into autoscaling and instance-sizing decisions.
- Cloud Infrastructure: Operate the Terraform stack across multiple clouds (GCP, AWS). Networking, identity, GPU nodes, autoscaling, per-tenant account configurations.
- API Layer: Build and extend the customer-facing API layer that fronts the inference service: client authentication, rate limiting, per-client data isolation, and request metering.
- Maintain Data Pipelines: Maintain and extend the data orchestration pipelines that feed model evaluation, customer reporting, and operational dashboards.
- Observability: Build and tune the metrics, dashboards, logging, and alarms across three layers: the inference service, the running instances, and the deployed models themselves.
- Related STEM degree (BSc, MSc, or higher) and 3+ years of work experience in platform / infra / backend / ML / applied-ML / data engineering.
- Strong engineering skills: The ability to write clean, scalable, production-grade code in Python or more performance-oriented language(s) (Go, Rust, C++).
- Architectural fluency across data stores, distributed systems, caching, and data transfer protocols.
- Data engineering skills: Comfort building data processing pipelines and using SQL (Airflow, BigQuery, Postgres).
- Deep cloud infrastructure and networking experience across one or more platforms (GCP, AWS).
- ML platform tooling: comfort with MLflow or similar tooling and model lifecycle processes (model versioning, artifact storage, promotion workflows).
- Terraform: write and modify modules, understand state and backends, IaC over console.
- CI/CD discipline: cloud OIDC, image signing, pinned versions, an instinct for cheap and reproducible CI.
- Observability instincts: comfortable instrumenting across hardware, application, and model layers (latency, throughput, score distributions, drift). You know which metric to look at first when latency spikes.
- Inference performance tuning: comfort with the levers of a high-throughput GPU service (micro-batching, concurrency, request queueing, in-container resource management).
- Strong written communication: runbooks, design docs, PR descriptions, postmortems, and ticket hygiene (Jira).
Not required, but a strong plus if you bring hands-on work experience with at least one of the following:
- Audio or media systems
- Signal processing
- Speech detection (synthetic / artificial)
- Computer vision
- GPU work beyond running inference (CUDA, kernels, drivers, cluster operations)
- Streaming systems (Kafka, Pub/Sub, Kinesis, or similar)