- You will act as ML Data Engineer focused on Spark applications, and you will be responsible for supporting and implementing code to build up the ML Development and Production process.
- You will be working with a team of data scientists and data engineers to support the MLOps process ongoing at the Security Operation Center. For this reason, you will be asked to assess the current Big Data technology in use and to test new ones to evolve the system, especially in Public Cloud Infrastructure.
- Integrating machine learning models onto a Hadoop based big data environment using Apache Spark.
- Working with data scientists to develop part of their code to interface with other components and to improve its efficiency.
- Liaise with the DevOps team maintaining the underlying hardware and supporting technologies.
- Implement automation and monitoring tools with a DevSecOps mindset
- Work with the Architecture Team to test new technology and plan the evolution of the Big Data System
- Developing a log enrichment pipeline capable of functioning at a massive scale.
- Configure Kafka and other components for handling data pipeline
- Maintain up-to-date documentation on the systems and procedures
- University degree in Computer Science or five years’ equivalent experience in Information Technology
- Proven Experience with Apache Spark and Kafka.
- Previous experience writing code for use at scale i.e. ‘big data’
- Have deployed code in Apache Hadoop ecosystems.
- Strong ability to develop in Python.
- Ability to code in Java and/or Scala is highly desirable
- Experience with CI/CD
- Experience with Data Science Platform highly desirable
- Experience with AWS
- Information security-related applications e.g. SIEM a benefit.
- Proven track record of developing effective and efficient real-time big data processing pipelines
- Skilled thinker, proactive, team-player, highly resourceful and detail-oriented.
- Able to track the work by following agile methodologies