Overview

We are building a resilient cloud platform and need a Lead DevOps Engineer to guide AWS infrastructure, Terraform automation, and Kubernetes operations. You will shape CI/CD, reliability, observability, and security while supporting databases, messaging, and storage across distributed systems—apply today

Responsibilities

  • Design and govern scalable cloud infrastructure that meets organizational needs
  • Build automation for deployments and configuration management
  • Monitor and sustain system health and performance to protect uptime
  • Troubleshoot and fix issues across distributed systems
  • Collaborate with development teams to improve CI/CD processes
  • Strengthen infrastructure as code practices to increase reliability and repeatability
  • Enforce and uphold security protocols across all environments
  • Support and maintain database, messaging, and storage systems
  • Implement and operate observability tooling for logging, monitoring, and alerting
  • Participate in on-call rotations and respond to operational incidents

Requirements

  • At least 5 years of experience in DevOps or a related engineering field
  • Minimum of 1 year in a leadership or team management position
  • Advanced proficiency with Amazon Web Services for cloud infrastructure operations
  • Strong Bash scripting skills for automation and system management
  • Proven experience in building and supporting CI/CD pipelines
  • In-depth knowledge of Kubernetes for container orchestration and management
  • Expertise in observability and troubleshooting within distributed environments
  • Hands-on experience with Terraform for infrastructure as code
  • English language skills (written and spoken) at B2+ level or higher

Nice to have

  • Background with AWS Aurora for managing relational databases
  • Experience with AWS Lambda for building serverless solutions
  • Understanding of Amazon API Gateway for API lifecycle management
  • Familiarity with Amazon CloudFront for content delivery network services
  • Skills in Amazon CloudWatch for monitoring and logging
  • Experience with Amazon Elastic Kubernetes Service (EKS) for managed Kubernetes clusters
  • Knowledge of Amazon Managed Grafana and Amazon Managed Service for Prometheus for observability and monitoring
  • Experience with Amazon OpenSearch for search and analytics
  • Familiarity with Amazon RDS and Amazon S3 for database and storage solutions
  • Experience with Argo CD for GitOps deployment workflows
  • Understanding of Azure DevOps for CI/CD and project management
  • Skills in Fluentbit and OpenTelemetry for log and trace collection
  • Experience with PowerShell and Python for scripting and automation

[GTS] Benefits (generic, except India)

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn