Data Scientist or Data Engineer
Date for this role: Until October 31st
Extension: Possible
Number of positions: 1
Location: Toronto – Hybrid (Candidate must be willing to be on site)
Purpose
The Data Management and Quality team is responsible for Enterprise Data Quality Governance and Issue Management activities throughout the Enterprise. Our mandate is to enable and establish a culture of Data Governance by continued measurement, monitoring and controls applied throughout the data lifecycle. The Data Quality Innovation role is responsible for developing, executing and enhancing the existing data quality framework through the use of advanced analytics in line with the business priorities. The incumbent will work closely with lines of business, data stewards and internal stakeholders to drive business value through the identification of data quality issues using AI/ ML approaches.
Responsibilities
• Work with large data sets, experience working with distributed computing (MapReduce, Hadoop, Hive, Apache Spark, etc.)
• Profile and analyze source data to identify opportunities for data quality interventions
• Work with business stakeholders to understand problem statement and develop machine learning algorithms and prototype them for execution in Hadoop and cloud environment
• Work with data engineers to brainstorm solutions to problems and support others in their goals
• Collaborate with data engineers to deploy production scale solutions
• Use sound agile development practices (code reviews, testing, etc.) to develop and deliver data products
• Provide day-to-day support and technical expertise to both technical and non-technical teams
• Exhibit sound judgement, keen eye for details and tenacity for solving difficult problems
• Use strong analytical skills and support use of data for sound decision making
• Translate business needs into technical requirements
• Champions a high-performance environment and contributes to an inclusive work environment
• Participates in knowledge transfer within the team and business units and identifies and recommend opportunities to enhance productivity, effectiveness, and operational efficiency of the business unit and/or team
• Monitors project progress by tracking activity; resolving problems; publishing progress reports; recommending actions.
• Liaising with multiple stakeholders from various business lines
Must have
• 3+ years of experience Industry experience as a Data Scientist or Data Engineer with solid knowledge of general Machine Learning concepts both in theory and application
• Proficient in a variety of languages: Python, R, Scala, Java. Preferred Python
• Strong proficiency in SQL
• 3+ years industry experience with big data technology (Hadoop, Hive, Spark, SQL, Kafka, etc.) Spark & Hive are preferred
• 2+ years of experience and understanding of Industry Data Quality process and practices
• Experience with data preprocessing, feature engineering, anomaly, and outlier detection
Nice to have
• Familiarity with container environments: Docker, Openshift
• Familiarity with DevOps processes, pipelines, and tooling
Soft skills:
• Strong oral and written communication skills and comfortable presenting to a variety of audiences
• Experience working in an Agile team environment
• Self-motivated to solve problems and able to work collaboratively with other team members
• Very detail oriented and skilled in summarizing and gleaning insights from large amounts of information
Education:
• Degree in Engineering, Computer Science or Mathematics/Statistics
Interview
2 step interview
1 interview 30 minutes video conference (Behavior & Technical Questions) with HM
2 interview 1 hour video conference (Technical questions) with HM & Team members