Overview

We are seeking a Lead Observability & SLI Engineer to design and implement observability and Service Level Indicators (SLIs) for real-time distributed platforms. The role focuses on engineering meaningful telemetry, embedding SLI checks into CI/CD and turning metrics, logs and traces into actionable reliability insights. This is an engineering role, not a traditional Ops or monitoring setup position.

Responsibilities

  • Define and validate SLIs/SLOs for real-time platform services (EFX / RDD / ECB)
  • Embed SLI checks and observability gates into CI/CD and GitOps workflows
  • Build end-to-end platform insights by correlating metrics, logs and traces
  • Improve telemetry instrumentation across distributed services
  • Support incident analysis and root cause identification using telemetry data
  • Deliver production-ready observability components together with SRE teams

Requirements

  • 5+ years of hands-on experience with SLIs/SLOs (p95/p99 latency, error rates, error budgets)
  • Deep understanding of observability signals (metrics, logs, traces) and how they work together
  • Background in integrating observability into automated pipelines (CI/CD, GitOps)
  • Expertise in OpenTelemetry, Prometheus, Grafana or similar tools such as Datadog
  • Cloud-native proficiency in Kubernetes, containers, Terraform and Helm
  • Strong system-level reasoning and troubleshooting skills

Nice to have

  • Proficiency in Python or NodeJS (production code, not just scripting)

Poland (Prod)

We gather like-minded people:

  • Engineering community of industry professionals
  • Friendly team and enjoyable working environment
  • Flexible schedule and opportunity to work remotely within Poland
  • Chance to work abroad for up to 60 days annually
  • Business-driven relocation opportunities

We provide growth opportunities:

  • Outstanding career roadmap
  • Leadership development, career advising, soft skills, and well-being programs
  • Certification (GCP, Azure, AWS)
  • Unlimited access to LinkedIn Learning, Get Abstract, Cloud Guru
  • English classes

We cover it all:

  • Stable income (Employment Contract or B2B)
  • Participation in the Employee Stock Purchase Plan
  • Benefits package (health insurance, multisport, shopping vouchers)
  • Strategically located offices featuring entertainment and relaxation zones, table tennis and football, free snacks, fantastic coffee, and more
  • Referral bonuses
  • Corporate, social and well-being events

Please, note:

  • The set of bonuses might vary based on the role you apply for – specifics will be discussed with our recruiter during the general interview.
  • We will reach out to selected candidates exclusively.

[epamgdo] Poland (About EPAM)

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.