Kubex is seeking a Principal Software Engineer (K8s GPU Optimization) to lead the technical direction and hands-on development of our AI infrastructure optimization capabilities. This is a senior, hands-on technical leadership role within the office of the CTO.
You will act as a principal-level engineer, owning the design and evolution of Kubex’s optimization solutions for Kubernetes-based environments running AI workloads, with a strong emphasis on GPU-accelerated inference. This role carries broad technical ownership and organizational influence, and we are looking for candidates interested in a position that provides both hands on and people-leadership opportunities.
This role is ideal for someone who combines deep, practical experience with GPU infrastructure and Kubernetes with the ability to reason about system-level trade-offs, optimization strategies, & real-world customer environments, and who remains excited to write and ship production code.
- Own the technical vision and architecture for Kubex’s AI infrastructure optimization capabilities, with a focus on Kubernetes-based environments running GPU-accelerated workloads.
- Lead the design of systems that automate the optimization of resource configurations and allocations across containers, nodes, GPUs, and autoscaling groups.
- Serve as a senior technical authority within the organization, guiding architectural decisions and influencing broader engineering strategy.
- Contribute directly to production code, remaining deeply hands-on in the design, implementation, and evolution of core platform components.
- Collaborate closely with other senior engineers, product managers and engineering leadership to coordinate and execute complex software development initiatives.
- Prototype, validate, and productionize new technical approaches related to AI workload optimization.
- Identify opportunities to extend Kubex’s value beyond inference workloads, including potential future optimizations for training or hybrid workloads.
- 10+ years of professional software engineering experience, including significant experience building complex, production systems.
- Deep, hands-on experience with GPU-accelerated infrastructure, particularly NVIDIA-based environments.
- Strong knowledge of Kubernetes, including how GPU-backed workloads are scheduled, scaled, and operated in real-world clusters.
- Practical experience with CUDA, GPU telemetry, and performance considerations for AI workloads.
- Proven ability to design and build systems that balance performance, cost efficiency, and operational reliability.
- Strong coding skills and a demonstrated commitment to remaining hands-on with production code.
- Excellent communication skills, with the ability to explain complex technical concepts to both internal and external audiences.
- Experience optimizing or operating large-scale AI inference platforms.
- Familiarity with advanced GPU sharing strategies, including MIG; time-slicing; MPS, and their implications for scheduling and performance.
- Exposure to optimization-based systems, scheduling, bin-packing, or resource allocation problems.
- Understanding of GPU specific scheduling technologies such as the KAI scheduler.
- Experience working with autoscaling frameworks such as Kubernetes HPA/VPA or Karpenter.
- Play a key role in shaping the future of AI infrastructure optimization.
- Work on technically challenging problems at the intersection of Kubernetes, GPUs, and AI workloads.
- Collaborate with a highly experienced, deeply technical team.
- Influence product direction, architecture, and external technical positioning.
- Flexible, remote-first culture focused on impact and innovation.
- Competitive compensation, equity, and benefits.