Overview

We are seeking a hands-on Lead DevOps Engineer to strengthen our Kubernetes platform operations and CI/CD ecosystem.

The engineer will actively contribute to scaling cloud-native infrastructure, improving deployment pipelines, enforcing Infrastructure as Code (IaC) standards, and enhancing operational resilience. This is a production-facing role requiring strong troubleshooting capability, ownership mindset, and practical experience operating mission-critical workloads in AWS.

Responsibilities

  • Manage and support Kubernetes clusters running production workloads
  • Advance cluster scalability, reliability, and performance by applying resource management, autoscaling, and workload isolation techniques
  • Boost observability through the integration of metrics, logging, and tracing for improved operational insight
  • Facilitate onboarding and enablement for multiple teams utilizing the platform
  • Develop, restructure, and expand GitHub Actions pipelines to enhance modularity, maintainability, and governance
  • Create reusable workflows and uphold standards across code repositories
  • Lower deployment risks by automating validation and testing procedures
  • Maximize pipeline efficiency and reduce operational costs
  • Build and maintain Terraform-based infrastructure, focusing on robust state management, modularity, and version control
  • Uphold IaC governance and conduct review processes
  • Assist with environment provisioning and lifecycle management
  • Administer and optimize AWS services including networking, IAM, compute, and storage
  • Strengthen secrets management and secure configuration practices
  • Participate in cost-saving initiatives for cloud resources
  • Maintain production stability and reinforce operational resilience
  • Enhance access controls and secure handling of sensitive information
  • Apply DevOps and SRE methodologies to production systems
  • Engage in incident resolution and root cause analysis
  • Lead efforts to improve system reliability and operational maturity

Requirements

  • At least 5 years of experience in DevOps positions with a focus on cloud technologies
  • Minimum one year of experience leading and managing development teams
  • Extensive hands-on experience with Kubernetes in production environments
  • Demonstrated ability to manage AWS cloud infrastructure for critical workloads
  • Advanced proficiency with GitHub Actions or similar CI/CD tools for pipeline automation
  • Strong skills with Terraform for Infrastructure as Code, including state management and modular design
  • Experience deploying and operating cloud-native systems in production
  • Exceptional troubleshooting and debugging abilities for complex technical challenges
  • Solid understanding of DevOps and SRE concepts, including reliability engineering patterns
  • Excellent English communication skills at B2+ level or higher, both written and spoken

Nice to have

  • Background working in regulated or healthcare sectors, with knowledge of compliance and security standards
  • Experience using observability tools like Prometheus and Grafana for monitoring and alerting
  • Familiarity with cost optimization methods in AWS to enhance resource utilization
  • Understanding of platform engineering principles and internal developer platforms to support team productivity

[GTS] Benefits (generic, except India)

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn