Overview

We are seeking a Senior Site Reliability Engineer to ensure the operational excellence and reliability of our production services. This role combines core SRE responsibilities with a specialization in generative AI technologies, focusing on AWS infrastructure, Kubernetes orchestration and observability platforms to support mission-critical systems.

Participation in the on-call support rotation is required for this role. The schedule is organized on a rotating basis, with each engineer covering one calendar week approximately once per month.

Responsibilities

  • Provide operational support for production services, including on-call rotation and major incident handling
  • Define, monitor and maintain Service Level Objectives (SLOs) and Indicators (SLIs) to ensure reliability
  • Manage and operate AWS infrastructure, particularly Kubernetes clusters, using Infrastructure as Code
  • Ensure the reliability and performance of microservices and event-driven architectures
  • Manage, tune and optimize search and observability platforms, with a specific focus on OpenSearch performance
  • Conduct root cause analysis (RCA) and drive problem management to prevent recurring issues
  • Take ownership of production environments and reliability outcomes
  • Collaborate with engineering teams to embed a reliability mindset across the organization

Requirements

  • 3+ years of experience in Site Reliability Engineering or related operational roles
  • Expertise in AWS services including EC2, EKS and ECS
  • Proficiency in AWS Bedrock and OpenSearch
  • Knowledge of IAM and AWS infrastructure management
  • Skills in Infrastructure as Code using Terraform
  • Background in container orchestration with Kubernetes
  • Familiarity with observability tools such as Instana, CloudWatch and ELK
  • Understanding of microservices, APIs and event-driven processing
  • Capability to perform strong RCA and problem management
  • Competency in SLO/SLI definition and reliability engineering practices
  • Upper-Intermediate English language proficiency (B2)

Nice to have

  • Familiarity with Ansible for configuration management

Latvia (Prod)

  • Engineering Heritage: Best-in-class experts sharing a culture of engineering excellence and tackling complex engineering challenges for over 30 years.
  • Advanced Tech Stack: Innovative projects where you can apply or enhance your expertise in Cloud, Data, AI, and other emerging technologies.
  • World-Class Clients: Work closely with 340+ of the Forbes Global 2000 on creating disruptive solutions that make a global impact.
  • Professional Growth: Exceptional support for career development with comprehensive resources for upskilling or reskilling in pioneering practices.
  • GenAI Community: Strong AI competencies with 600+ experts across 55+ locations driving GenAI-enabled transformation journeys.
  • Entrepreneurial Culture: If you're passionate and dedicated to improving business transformation, we provide the support you need to bring your ideas to life.
  • Hybrid Setup: The flexibility to work from any location in Latvia, whether it's your home or our office in Riga.
  • Other Benefits: Additional vacation and trust days, private health insurance, Employee Stock Purchase Plan and more.

Lithuania (Prod)

  • Engineering Heritage: Best-in-class experts sharing a culture of engineering excellence and tackling complex engineering challenges for over 30 years
  • Advanced Tech Stack: Innovative projects where you can apply or enhance your expertise in Cloud, Data, AI, and other emerging technologies
  • World-Class Clients: Work closely with 340+ of the Forbes Global 2000 on creating disruptive solutions that make a global impact
  • Professional Growth: Exceptional support for career development with comprehensive resources for upskilling or reskilling in pioneering practices
  • GenAI Community: Strong AI competencies with 600+ experts across 55+ locations driving GenAI-enabled transformation journeys
  • Entrepreneurial Culture: If you're passionate and dedicated to improving business transformation, we provide the support you need to bring your ideas to life
  • Hybrid Setup: The flexibility to work from any location in Lithuania, whether it's your home or our dynamic offices in Vilnius and Kaunas
  • Other Benefits: Additional vacation and trust days, private health insurance, Employee Stock Purchase Plan and more

[epamgdo] Latvia (Remote)

Feel free to work remotely from anywhere across Latvia or connect with colleagues at our Riga office.

[epamgdo] Lithuania (Remote)

Feel free to work remotely from anywhere across Lithuania or connect with colleagues at our Vilnius and Kaunas offices.

[epamgdo] Latvia (Salary)

Salary range €3.8K-€5.7K gross, based on your experience and interview results.

[epamgdo] Lithuania (Salary)

Salary range €3.2K-€5K gross, based on your experience and interview results.

[epamgdo] Latvia (About EPAM)

EPAM is a leading global provider of digital platform engineering and development services. For over 30 years, our team has helped leading brands navigate the waves of digital transformation, building solutions that help them stay competitive through constant market disruption.With offices in 55+ countries, EPAM has grown in Latvia to over 150+ talented innovators in 4 years. We foster creativity and unconventional ways of doing things, welcoming like-minded professionals to join us.

[epamgdo] Lithuania (About EPAM)

EPAM is a leading global provider of digital platform engineering and development services. For over 30 years, our team has helped leading brands navigate the waves of digital transformation, building solutions that help them stay competitive through constant market disruption.With offices in 55+ countries, EPAM has grown in Lithuania to over 1,300+ talented innovators in just 5 years. We foster creativity and unconventional ways of doing things, welcoming like-minded professionals to join us.