A large Wealth Management firm operating under a Broker-Dealer model is seeking an experienced Site Reliability Engineer to support feature development on its newly built Trading Platform. The platform has been in development for two years and is currently in a stabilization phase, with a production launch targeted in four months.
Req# 1023988611
Responsibilities
-
Implement and champion DevOps and SRE best practices across the organization
-
Drive technology roadmap discussions for the SRE team
-
Define, craft, and maintain SLIs and SLOs, along with key metrics including MTTR, Lead Time for Change, Deployment Frequency, and Change Failure Rate
-
Design, develop, and manage monitoring, alerting, and observability solutions using Dynatrace, Splunk, and Grafana
-
Conduct performance assessments, identify bottlenecks, and recommend enhancements to improve system performance
-
Partner with application teams to enforce performance and availability SLAs
-
Collaborate with product owners to manage error budgets, prioritize toil backlogs, and validate against team, application, and incident metrics
-
Participate in an on-call rotation to respond to production events and outages
-
Continuously improve CI/CD pipelines and deployment processes
-
Lead troubleshooting efforts, incident management, and root cause analysis
-
Identify and build automated processes wherever possible
-
Implement cybersecurity measures through ongoing vulnerability assessments and risk management
-
Provide periodic progress reports to management and stakeholders
-
Partner with application teams to support and ease their adoption of the platform
-
Facilitate clear coordination and communication within the team and with customers
-
Analyze existing systems and develop plans for enhancements and improvements
Requirements
-
Bachelor's degree in Computer Science or a related field, and/or equivalent work experience
-
5+ years of experience working within DevOps or SRE teams
-
Proven experience supporting production infrastructure
-
Strong knowledge of CI/CD principles and pipelines
-
Solid understanding of observability concepts, including monitoring, logging, and tracing
-
Hands-on experience with Dynatrace and Splunk
-
Experience with at least one major cloud provider (AWS, Azure, or GCP)
-
Demonstrated experience operating high-availability, fault-tolerant, scalable, and distributed systems in production
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our clients, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
Engineer the Future with a Career at EPAM
EPAM Canada welcomes and encourages applications from candidates with disabilities. Please contact WFA Human Resource CA [email protected] if you have questions in this regard, or if you require an accommodation to complete the application process. Click here to review EPAM’s Accessibility for Ontarians with Disabilities Accessibility Policies and Multi-Year Access.
An artificial intelligence system is software that is developed with one or more techniques that can, for a given set of human-defined objectives, using algorithmic information processing, generate outputs such as content, predictions, recommendations, or decisions with varying levels of autonomy (“AI”). Tasks that humans have traditionally done by thinking and reasoning are increasingly being done by, or with the help of, AI to help create efficiencies.EPAM may use AI during the recruitment process, in connection with collecting or processing your personal data. Some (non-exhaustive) examples of tasks that EPAM may use AI for include conducting initial screening, creating transcripts of interviews, and assessing applications/CVs against defined job description criteria to make suggestions to the individuals evaluating your candidacy.Your personal data and the results of any processing are not shared with AI applications outside of EPAM infrastructure. While EPAM may use AI to help create efficiencies during the recruitment process, EPAM does not use AI to make hiring decisions, which is done by EPAM Talent Acquisition and management.