At BeeHero, we believe technology should do more than just make life easier; it should drive real-world impact . Our mission is to future-proof the global food supply through precision pollination. BeeHero is a data-driven company that has developed a unique platform that leverages low-cost IoT sensors and artificial intelligence to collect and analyze data on bee behavior and pollination patterns. This data is then used to help farmers and food producers optimize their crop yields, improve crop quality, and enhance overall ecosystem health. With BeeHero’s technology, farmers gain deep insights into the complex world of pollination and can make more informed decisions about how to manage their land and resources. This data empowers farmers with actionable insights to boost crop yields, improve quality, and support the health of our ecosystems. With BeeHero’s technology, farmers gain deep insights into the complex world of pollination and can make more informed decisions about how to manage their land and resources.
The role: We are looking for a Data Engineer to design, build, and maintain scalable data pipelines and infrastructure that power our data-driven decision-making processes. You will be part of the infrastructure development team and will serve as the professional focal point for areas related to data engineering, to guide and influence best practices across different domains. You will collaborate with data scientists, analysts, and other engineering teams to integrate and optimize data workflows, ensuring the availability of accurate, reliable, and secure data. If you are passionate about building robust data systems and thrive in a collaborative environment, we'd love to hear from you!
Responsibilities:
* Design, deploy, and manage data pipelines and storage solutions in cloud environments, particularly AWS.
* Design, develop, and maintain scalable and efficient ETL pipelines using Python.
* Integrate data from various sources (e.g., databases, APIs, cloud storage) into a unified data warehouse or data lake.
* Design, implement, and manage databases, data warehouses, and data lakes.
* Ensure database optimization and performance tuning for efficient data retrieval and storage.
* Implement data validation and cleaning processes to ensure the accuracy and quality of data.
* Work closely with data scientists, and software engineering teams to ensure seamless data integration into applications and workflows.
* Continuously improve data infrastructure to support scalability and high-performance data processing.
* Automate recurring data-related tasks and workflows using scripting languages (e.g., Python, Bash) or tools (e.g., Apache Airflow).
* Proactively monitor and troubleshoot issues related to data pipelines, databases, and infrastructure.
Requirements: * 3+ years of experience in data engineering or a related field
* 3+ years of experience working with data science teams to integrate and manage data workflows
* Proficiency in Python for data manipulation and scripting
* Strong knowledge of SQL for querying and data management
* Experience with AWS (e.g., S3, Redshift, EMR, Glue, Athena, Lambda, etc.) for cloud-based data processing and storage
* Experience with Apache Spark for large-scale data processing
* Experience with ETL pipelines (designing, building, and maintaining)
* Experience with Apache Airflow for workflow automation and orchestration Advantages:
* Familiarity with Docker for containerization and deployment
* Experience with AWS CDK for defining cloud infrastructure as code
* Knowledge of BI tools (e.g., Tableau, Looker, Power BI) for data visualization and reporting
* Experience with Apache Iceberg for managing large datasets in cloud environments
* Experience with NoSQL databases (e.g., mongoDB, DynamoDB)
At BeeHero you have the opportunity to be: Impactful: Your work will directly impact agricultural practices aro
This position is open to all candidates.