Overview
We are seeking a hands-on Lead DevOps Engineer to strengthen our Kubernetes platform operations and CI/CD ecosystem.
The engineer will actively contribute to scaling cloud-native infrastructure, improving deployment pipelines, enforcing Infrastructure as Code (IaC) standards, and enhancing operational resilience. This is a production-facing role requiring strong troubleshooting capability, ownership mindset, and practical experience operating mission-critical workloads in AWS.
Responsibilities
- Manage and support Kubernetes clusters running production workloads
- Advance cluster scalability, reliability, and performance by applying resource management, autoscaling, and workload isolation techniques
- Boost observability through the integration of metrics, logging, and tracing for improved operational insight
- Facilitate onboarding and enablement for multiple teams utilizing the platform
- Develop, restructure, and expand GitHub Actions pipelines to enhance modularity, maintainability, and governance
- Create reusable workflows and uphold standards across code repositories
- Lower deployment risks by automating validation and testing procedures
- Maximize pipeline efficiency and reduce operational costs
- Build and maintain Terraform-based infrastructure, focusing on robust state management, modularity, and version control
- Uphold IaC governance and conduct review processes
- Assist with environment provisioning and lifecycle management
- Administer and optimize AWS services including networking, IAM, compute, and storage
- Strengthen secrets management and secure configuration practices
- Participate in cost-saving initiatives for cloud resources
- Maintain production stability and reinforce operational resilience
- Enhance access controls and secure handling of sensitive information
- Apply DevOps and SRE methodologies to production systems
- Engage in incident resolution and root cause analysis
- Lead efforts to improve system reliability and operational maturity
Requirements
- At least 5 years of experience in DevOps positions with a focus on cloud technologies
- Minimum one year of experience leading and managing development teams
- Extensive hands-on experience with Kubernetes in production environments
- Demonstrated ability to manage AWS cloud infrastructure for critical workloads
- Advanced proficiency with GitHub Actions or similar CI/CD tools for pipeline automation
- Strong skills with Terraform for Infrastructure as Code, including state management and modular design
- Experience deploying and operating cloud-native systems in production
- Exceptional troubleshooting and debugging abilities for complex technical challenges
- Solid understanding of DevOps and SRE concepts, including reliability engineering patterns
- Excellent English communication skills at B2+ level or higher, both written and spoken
Nice to have
- Background working in regulated or healthcare sectors, with knowledge of compliance and security standards
- Experience using observability tools like Prometheus and Grafana for monitoring and alerting
- Familiarity with cost optimization methods in AWS to enhance resource utilization
- Understanding of platform engineering principles and internal developer platforms to support team productivity
[GTS] Benefits (generic, except India)
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn