Overview

We are seeking an experienced Lead Site Reliability Engineer to spearhead our infrastructure reliability initiatives and guide a team of talented engineers. In this role, you will shape technical strategy, mentor team members and drive operational excellence across our cloud-based platforms and distributed services.

Responsibilities

  • Lead the design and evolution of resilient, scalable infrastructure across multiple cloud providers
  • Mentor and guide a team of engineers, fostering technical growth and best practices
  • Define reliability standards, SLOs and operational policies for production environments
  • Architect automation frameworks to streamline deployments and infrastructure management
  • Oversee CI/CD strategy and ensure efficient software delivery workflows
  • Coordinate incident response efforts and lead post-mortem analyses to prevent recurrence
  • Partner with engineering leadership to align reliability goals with business priorities
  • Champion observability practices to enhance system visibility and proactive issue detection
  • Provide technical direction for microservices and event-driven architecture initiatives
  • Evaluate emerging tools and technologies to enhance the reliability ecosystem
  • Drive capacity planning, cost optimization and performance tuning across platforms

Requirements

  • 5+ years of experience in DevOps or Site Reliability Engineering
  • Expertise in AWS, Azure and GCP
  • Competency in Kubernetes, Terraform and Ansible
  • Skills in GitHub and Jenkins
  • Knowledge of microservices, APIs and event-driven processing
  • Strong written and verbal English communication skills (B2+)

Mexico (Remote)

  • Career plan and real growth opportunities
  • Unlimited access to LinkedIn learning solutions
  • Constant training, mentoring, online corporate courses, eLearning and more
  • English classes with a certified teacher
  • Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
  • Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
  • Flexible work schedule and dress code
  • Collaborate in a multicultural environment and share best practices from around the globe
  • Hired directly by EPAM & 100% under payroll
  • Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
  • Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
  • 13 % employee savings fund, capped to the law limit
  • Grocery coupons
  • 30 days December bonus
  • Employee Stock Purchase Plan
  • 12 vacations days
  • Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
  • Monthly non-taxable amount for the electricity and internet bills

[epamgdo] Mexico (About EPAM)

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

[epamgdo] Mexico (Personal Data)

By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.