Overview
We are looking for a Senior Site Reliability Engineer to join our dynamic and growing team supporting the Customer Last Mile area and Order Services. In this role, you will bring deep expertise in AWS Bedrock and OpenSearch (index and performance tuning) to ensure the reliability, scalability, and performance of our critical microservices ecosystem.
Responsibilities
- Own production environments, including on-call coverage and major incident handling
- Lead root cause analysis and drive problem management to closure
- Define and maintain SLOs/SLIs while promoting a reliability-first mindset across teams
- Operate and optimize Kubernetes workloads in AWS (EKS/ECS)
- Manage infrastructure as code using Terraform and Ansible
- Implement and maintain monitoring, alerting, and observability solutions with Instana, CloudWatch, and ELK
- Perform log analysis, alert hygiene, and capacity planning
- Support reliability patterns for CLM microservices, including APIs and async/event-driven processing
- Tune and maintain AWS Bedrock and OpenSearch indexes for optimal performance
- Apply secure-by-design principles across all infrastructure and services
- Drive automation-first practices, documentation, and cross-team collaboration
- Participate in the on-call support rotation, covering one calendar week approximately once per month
Requirements
- 3+ years of experience in Site Reliability Engineering or related operations roles
- Expertise in AWS Bedrock and OpenSearch with a focus on index and performance tuning
- Proficiency in AWS fundamentals, including EC2, EKS/ECS and IAM/networking
- Background in Kubernetes operations at production scale
- Skills in infrastructure as code with Terraform
- Competency in observability tooling such as Instana, CloudWatch, and ELK
- Understanding of microservices reliability patterns, APIs, and async/event-driven processing
- Knowledge of SLO/SLI definition, RCA methodologies, and problem management practices
- Familiarity with secure-by-design principles and operational security
- Capability to handle production ownership, on-call duties, and major incident response
- Strong collaboration, documentation, and automation-first mindset
- English proficiency at a B2 level to ensure effective communication and documentation
Nice to have
- Flexibility to use Ansible for configuration management
- Showcase of advanced capacity planning and alert hygiene practices
- Qualifications in tuning large-scale search and AI/ML platform workloads
Ukraine
With us you can:
- Work on a flexible schedule remotely or from any of our comfortable offices or coworking spaces in Ukraine
- Receive the necessary equipment to perform your work tasks
- Change projects and technology stacks within EPAM
- Gain experience in various business domains (Insurance, E-commerce, Healthcare, Finance, Travelling, Media, Artificial Intelligence, and more)
- Relocation opportunities may be available for eligible candidates, depending on the role and openings at other EPAM locations
- Participate in volunteer, charity programs and communities (both technical and interest-based)
We focus on your professional growth:
- You can plan your individual career path together with your manager
- Receive regular feedback from colleagues
- Improve your English for free with certified teachers (Speaking Clubs, client interview preparation courses, etc.)
- Get the opportunity to undergo free training and certification in AWS, GCP, or Azure Clouds
- Use the internal E-learn training program (18,200+ specialized training and mentoring programs)
- Access corporate accounts on LinkedIn Learning, Get Abstract and other partner resources
- Study at EPAM Solution Architecture School with the instructors who are practicing architects
- Develop as a leader, join Delivery Management, Resource Management, Leadership Essentials school and more
- Participate in internal communities (500+ meetups, technical discussions, brainstorming sessions, online events and conferences annually)
What we offer:
- Vacation and sick leave (including a sick leave without a medical certificate)
- A wide range of Voluntary Medical Insurance programs providing both medical treatment and various preventive options (including sports activities)
- Medical insurance for family members at corporate rates
- Company support during significant life events (childbirth or adoption, marriage, etc.)
- Support for psychological comfort: discounts on services from mental health specialists or coaches, thematic training
- E-kids program - a free programming language training program for EPAMers' children
[epamgdo] Ukraine (benefits may differ)
Kindly be advised that the set of benefits, including learning, certification, and other opportunities, may vary depending on the role you apply for. Our recruiter will be able to share more details about the specific opportunity during your general interview.
[epamgdo] Ukraine (About EPAM)
EPAM strives to provide its global team of over 62,350 professionals in more than 55 countries with opportunities for professional growth from day one of collaboration. Our colleagues are the source of EPAM's success, so we value cooperation, strive to always understand our clients' business and aim for the highest quality standards. No matter where you are, you will join a dedicated, diverse community that will help you realize your potential to the fullest.