Are you passionate about ensuring system reliability, scalability, and performance? Do you thrive in a dynamic environment where automation and operational excellence are key?
We are looking for a Site Reliability Engineer (SRE) to join our team and play a crucial role in designing, implementing, and maintaining our cloud-based infrastructure. In this role, you will collaborate across teams to drive automation, improve system resilience, and optimize performance while fostering a culture of reliability.
Responsibilities:
System Reliability Ensure high availability and performance of services through effective monitoring, incident management, and root cause analysis.
Automation & Tooling Develop and maintain automation for infrastructure provisioning, configuration management, and application deployment.
Performance Optimization Analyze and enhance system performance, including load balancing, caching, and database tuning. Conduct regular capacity planning.
Incident Response & Troubleshooting Lead incident response efforts, participate in on-call rotations, and troubleshoot complex infrastructure issues.
Security & Compliance Collaborate with security teams to implement best practices and ensure compliance with relevant standards (ISO 27001, SOC 2, etc.).
Collaboration & Mentorship Work closely with developers, DevOps, Support, and product teams to enhance application reliability and implement SRE best practices.
Requirements: Requirements:
5+ years in site reliability engineering, DevOps, or related roles.
Proven experience managing large-scale, cloud-based infrastructure in GCP, AWS, or Azure.
Expertise in container orchestration (Kubernetes, Docker) and microservices architecture.
Strong proficiency in scripting and programming languages (Python, Go, Bash, etc.).
Experience with CI/CD pipelines, infrastructure as code (Terraform, CloudFormation), and configuration management (Ansible, Puppet, Chef).
Hands-on experience with monitoring and observability tools (Datadog, Prometheus, Grafana, ELK Stack).
Deep understanding of networking concepts, DNS, load balancing, and distributed systems.
Strong problem-solving skills, excellent communication, and a proactive mindset.
Advantages:
Certifications AWS Certified Solutions Architect, GCP Professional Cloud Architect, or Kubernetes certifications (CKA, CKAD).
This position is open to all candidates.