Overview
We are seeking an experienced Lead Site Reliability Engineer to spearhead our infrastructure reliability initiatives and guide a team of talented engineers. In this role, you will shape technical strategy, mentor team members and drive operational excellence across our cloud-based platforms and distributed services.
Responsibilities
- Lead the design and evolution of resilient, scalable infrastructure across multiple cloud providers
- Mentor and guide a team of engineers, fostering technical growth and best practices
- Define reliability standards, SLOs and operational policies for production environments
- Architect automation frameworks to streamline deployments and infrastructure management
- Oversee CI/CD strategy and ensure efficient software delivery workflows
- Coordinate incident response efforts and lead post-mortem analyses to prevent recurrence
- Partner with engineering leadership to align reliability goals with business priorities
- Champion observability practices to enhance system visibility and proactive issue detection
- Provide technical direction for microservices and event-driven architecture initiatives
- Evaluate emerging tools and technologies to enhance the reliability ecosystem
- Drive capacity planning, cost optimization and performance tuning across platforms
Requirements
- 5+ years of experience in DevOps or Site Reliability Engineering
- Expertise in AWS, Azure and GCP
- Competency in Kubernetes, Terraform and Ansible
- Skills in GitHub and Jenkins
- Knowledge of microservices, APIs and event-driven processing
- Strong written and verbal English communication skills (B2+)
Mexico (Remote)
- Career plan and real growth opportunities
- Unlimited access to LinkedIn learning solutions
- Constant training, mentoring, online corporate courses, eLearning and more
- English classes with a certified teacher
- Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
- Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
- Flexible work schedule and dress code
- Collaborate in a multicultural environment and share best practices from around the globe
- Hired directly by EPAM & 100% under payroll
- Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
- Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
- 13 % employee savings fund, capped to the law limit
- Grocery coupons
- 30 days December bonus
- Employee Stock Purchase Plan
- 12 vacations days
- Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
- Monthly non-taxable amount for the electricity and internet bills
[epamgdo] Mexico (About EPAM)
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
[epamgdo] Mexico (Personal Data)
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.