Overview
We are building a resilient cloud platform and need a Lead DevOps Engineer to guide AWS infrastructure, Terraform automation, and Kubernetes operations. You will shape CI/CD, reliability, observability, and security while supporting databases, messaging, and storage across distributed systems—apply today
Responsibilities
- Design and govern scalable cloud infrastructure that meets organizational needs
- Build automation for deployments and configuration management
- Monitor and sustain system health and performance to protect uptime
- Troubleshoot and fix issues across distributed systems
- Collaborate with development teams to improve CI/CD processes
- Strengthen infrastructure as code practices to increase reliability and repeatability
- Enforce and uphold security protocols across all environments
- Support and maintain database, messaging, and storage systems
- Implement and operate observability tooling for logging, monitoring, and alerting
- Participate in on-call rotations and respond to operational incidents
Requirements
- At least 5 years of experience in DevOps or a related engineering field
- Minimum of 1 year in a leadership or team management position
- Advanced proficiency with Amazon Web Services for cloud infrastructure operations
- Strong Bash scripting skills for automation and system management
- Proven experience in building and supporting CI/CD pipelines
- In-depth knowledge of Kubernetes for container orchestration and management
- Expertise in observability and troubleshooting within distributed environments
- Hands-on experience with Terraform for infrastructure as code
- English language skills (written and spoken) at B2+ level or higher
Nice to have
- Background with AWS Aurora for managing relational databases
- Experience with AWS Lambda for building serverless solutions
- Understanding of Amazon API Gateway for API lifecycle management
- Familiarity with Amazon CloudFront for content delivery network services
- Skills in Amazon CloudWatch for monitoring and logging
- Experience with Amazon Elastic Kubernetes Service (EKS) for managed Kubernetes clusters
- Knowledge of Amazon Managed Grafana and Amazon Managed Service for Prometheus for observability and monitoring
- Experience with Amazon OpenSearch for search and analytics
- Familiarity with Amazon RDS and Amazon S3 for database and storage solutions
- Experience with Argo CD for GitOps deployment workflows
- Understanding of Azure DevOps for CI/CD and project management
- Skills in Fluentbit and OpenTelemetry for log and trace collection
- Experience with PowerShell and Python for scripting and automation
[GTS] Benefits (generic, except India)
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn