Designation: Associate
Level: L2
Experience: 4 to 7 years
Location: Toronto, Ontario, Canada
Job Description
Job Summary:
The role is for a highly experienced and skilled Data Engineer responsible for leading the design, implementation, and optimization of robust, scalable ETL/ELT data pipelines for batch and streaming data. The engineer will drive advanced data modeling, contribute to enterprise data architecture, ensure data quality and governance, and collaborate closely with cross-functional teams to build full-stack data solutions and analytics platforms, specifically leveraging expertise in SQL, Python/PySpark, and cloud environments like GCP. This position requires strong technical leadership, mentoring skills, and a commitment to data security and performance optimization.
Responsibilities:
1. Data Pipeline Development: Lead the design, implementation, and optimization of robust, scalable ETL/ELT pipelines for batch and streaming data, handling large-scale ingestion from multiple sources into data warehouses or lakes.
2. Data Modeling: Develop advanced data models, schemas, and optimizations to support sophisticated analysis, reporting, and ML workflows, ensuring performance and cost-efficiency.
3. Data Integration: Partner with data scientists, product managers, and engineers to define data requirements, integrate disparate sources, and build full-stack data solutions for analytics platforms.
4. SQL and Coding Expertise: Author and optimize complex SQL queries and Python/PySpark code for data transformation, automation, and real-time processing; implement monitoring for pipeline health.
5. Data Architecture: Drive contributions to enterprise data architecture, making decisions on storage (e.g., data lakes vs. warehouses), processing frameworks, and scalability, while evaluating operational trade-offs.
6. Security and Compliance: Enforce data security best practices, implement access controls, encryption, and auditing to comply with global regulations and industry standards.
7. Documentation, Collaboration, and Mentorship: Maintain comprehensive documentation of pipelines and architectures; lead cross-functional collaborations to resolve challenges; mentor junior engineers on best practices and code reviews.
8. Innovation and Optimization: Identify opportunities to improve data systems, such as incorporating AI/ML for automation or optimizing for cost and performance in cloud environments.
9. Data Quality, Monitoring & Governance: Define and implement data validation, testing, and monitoring frameworks to ensure accuracy, completeness, and freshness of datasets. Set up proactive alerting and observability on data pipelines and jobs. Enforce and advocate for best practices in data governance, including access control, lineage, and compliance (e.g., GDPR, HIPAA).