We are seeking a highly skilled Senior Data Engineer to join ML Data Group and help drive the development and optimization of our cutting-edge data infrastructure. As a key member of the Platform team, you will play an instrumental role in building and evolving our feature store data pipeline, enabling machine learning teams to efficiently access and work with high-quality, real-time data at scale.
In this dynamic, fast-paced environment, you will collaborate with other data professionals to create robust, scalable data solutions. You will be responsible for architecting, designing, and implementing data pipelines that ensure reliable data ingestion, transformation, and storage, ultimately supporting the production of high-performance ML models.
We are looking for data-driven problem-solvers who thrive in ambiguous, fast-moving environments and are passionate about building data systems that empower teams to innovate and scale. We value independent thinkers with a strong sense of ownership, who can take challenges from concept to production while continuously improving our data infrastructure.
As a Data Engineer you will...
Design and implement large-scale batch & streaming data pipelines infrastructure
Build and optimize data workflows for maximum reliability and performance
Develop solutions for real-time data processing and analytics
Implement data consistency checks and quality assurance processes
Design and maintain state management systems for distributed data processing
Take a crucial role in building the group's engineering culture, tools, and methodologies
Define abstractions, methodologies, and coding standards for the entire Data Engineering pipeline.
Requirements: 5+ years of experience as a Software Engineer with focus on data engineering
Expert knowledge in building and maintaining data pipelines at scale
Strong experience with stream/batch processing frameworks (e.g. Apache Spark, Flink)
Profound understanding of message brokers (e.g. Kafka, RabbitMQ)
Experience with data warehousing and lake technologies
Strong Python programming skills and experience building data engineering tools
Experience with designing and maintaining Python SDKs
Proficiency in Java for data processing applications
Understanding of data modeling and optimization techniques
Bonus Points
Experience with ML model deployment and maintenance in production
Knowledge of data governance and compliance requirements
Experience with real-time analytics and processing
Understanding of distributed systems and cloud architectures
Experience with data visualization and lineage tools/frameworks and techniques.
This position is open to all candidates.