We are looking for a Senior Data Platform Engineer to design, build, and scale next-generation data platform, the backbone powering our AI-driven insights.
This role sits at the intersection of data engineering, infrastructure, and MLOps, owning the architecture and reliability of our data ecosystem end-to-end.
Youll work closely with data scientists,r&d teams, analysts to create a robust platform that supports varying use cases, complex ingestion, and AI-powered analytics.
Responsibilities:
Architect and evolve a scalable, cloud-native data platform that supports batch, streaming, analytics, and AI/LLM workloads across R&D.
Help define and implement standards for how data is modeled, stored, governed, and accessed
Design and build data lakes and data warehouses
Develop and maintain complex, reliable, and observable data pipelines
Implement data quality, validation, and monitoring frameworks
Collaborate with ML and data science teams to connect AI/LLM workloads to production data pipelines, enabling RAG, embeddings, and feature engineering flows.
Manage and optimize relational and non-relational datastores (Postgres, Elasticsearch, vector DBs, graph DBs).
Build internal tools and self-service capabilities that enable teams to easily ingest, transform, and consume data.
Contribute to data observability, governance, documentation, and platform visibility
Drive strong engineering practices
Evaluate and integrate emerging technologies that enhance scalability, reliability, and AI integration in the platform.
Requirements: 7+ years experience building/operating data platforms
Strong Python programming skills
Proven experience with cloud data lakes and warehouses (Databricks, Snowflake, or equivalent).
Data orchestration experience (Airflow)
Solid understanding of AWS services
Proficiency with relational databases and search/analytics stores
Experience designing complex data pipelines, managing data quality, lineage, and observability in production.
Familiarity with CI/CD, GitOps, and IaC
Excellent understanding of distributed systems, data partitioning, and schema evolution.
Strong communication skills, ability to document and present technical designs clearly.
Advantages:
Experience with vector databases and graph databases
Experience integrating AI/LLM workloads into data pipelines (feature stores, retrieval pipelines, embeddings).
Familiarity with event streaming and CDC patterns.
Experience with data catalog, lineage, or governance tools
Knowledge of monitoring and alerting stacks
Hands-on experience with multi-source data product architectures.
This position is open to all candidates.