AI/ML Infrastructure Engineer
Canada(Remote)
Full-time (Permanent)
JD:
We are looking to hire an Skilled AI/ML Infrastructure Engineer immediately
Role Overview:
We are looking for an experienced Infrastructure Engineer to design, automate, and operate scalable cloud infrastructure supporting data platforms and AI/ML workloads across GCP and Azure. This role focuses on Infrastructure such as Code, CI/CD automation, cloud networking, and enabling reliable, secure environments for data engineering and analytics teams.
Key Responsibilities:
- Design, provision, and manage cloud infrastructure using Terraform
- Build and maintain CI/CD pipelines using Azure DevOps
- Provision and manage GCP infrastructure, including compute, storage, IAM, and networking
- Support and manage Azure infrastructure (VNets, networking, compute, storage)
- Design and implement network provisioning (VPC/VNet architecture, routing, firewalls, load balancers, private connectivity)
- Build and operate infrastructure for data platforms (data lakes, warehouses, streaming, analytics platforms)
- Provision and support AI/ML infrastructure, including GPU resources and AI platforms
- Implement security best practices, IAM, encryption, and compliance controls
- Optimize infrastructure for performance, reliability, and cost
- Collaborate with data engineering, analytics, and ML teams
- Document infrastructure, architecture, standards, and operational runbooks
Required Skills & Qualifications:
- Strong experience with Terraform (Infrastructure as Code)
- Experience with CI/CD pipelines, preferably Azure DevOps
- Strong hands on experience with Google Cloud Platform (GCP)
- Solid understanding of cloud networking and network provisioning
- Experience supporting data platforms or large scale data workloads
- Experience with AI/ML infrastructure
- Strong Linux and scripting skills (Bash, Python, etc.)
Preferred / Nice to Have:
- Hands on experience with Azure infrastructure
- Experience with Kubernetes (GKE / AKS)
- Experience with data services such as BigQuery, Dataflow, Dataproc, Synapse, ADLS, Snowflake
- Monitoring and observability tools (Prometheus, Grafana, Cloud Monitoring)
- Multi cloud experience and relevant certifications