129696-en_US
We are looking for an experienced DevOps and Site Reliability Engineer to join our team. In this role, you will:
Manage a portfolio of customer-facing cloud services (SaaS/IaaS), ensuring overall availability, performance, and security.
Administer cloud-based environments that support our SaaS / IaaS offerings, implemented in our Kubernetes-based architecture.
Automate repetitive and error-prone tasks and processes using tools like Ansible, Python, Terraform, and other scripting languages.
Monitor production environments for issues using tools like Prometheus, Signoz, and Elasticsearch.
Continuously measure availability, latency, and system health using tools like Grafana, Pingdom, and others.
Secure the environment from security threats by deploying patches and least-privilege configurations.
Respond to incidents and drive changes that prevent those incidents from recurring by automating recovery processes.
Design and implement tools for automated deployment across multiple environments like AWS, Google Cloud, and Azure.
The role also includes participating in a rotational on-call duty schedule, providing 24/7 coverage.
Requirements: You have:
At least three years of experience as a DevOps engineer.
A Bachelor of Science Degree in Computer Science, Information Technology, or a related field is preferred.
Familiarity with Linux and Windows operating systems.
Strong familiarity with Kubernetes and Docker.
The ability to write automation scripts and a good understanding of at least two programming/scripting languages such as Bash, Python, or Go.
Proficiency in infrastructure as code and automation tools such as Terraform and Ansible.
An understanding of public cloud vendors such as AWS, Azure, Google Cloud, or others.
A natural curiosity and high motivation, with a passion for ensuring scalable, performant, and highly available solutions.
Excellent debugging skills and love solving technical problems throughout a technology stack.
Fluent English.
This position is open to all candidates.