Overview

We are seeking a Senior Operational Intelligence Developer to join our dynamic team, focusing on maintaining and enhancing the Elastic & Observability Platform deployed across GCP and Elastic Cloud. This role involves managing platform operations, developing self-service capabilities, and collaborating with stakeholders to ensure optimal performance and reliability.

As part of this role, the successful candidate will participate in an on-call rotation dedicated to monitoring platform health and functionality. Weekday on-call duty spans business hours (Monday to Friday, 09:00–18:00), while weekend on-call involves one 48-hour shift every four weeks. Weekend on-call is passive by default, requiring action only if issues arise that affect platform health and performance.

Responsibilities

  • Ensure the availability, functionality, performance, and security of observability and search platforms in alignment with business SLAs
  • Provide incident response and resolution as the first point of escalation during on-call periods
  • Manage platform documentation, SOPs, and operational guidelines
  • Coordinate with internal stakeholders and vendors for installation, upgrades, and operational requirements
  • Design and develop platform features and self-service capabilities for customers
  • Deliver proofs-of-concept to improve platform operations, such as integrating AI-driven enhancements or Kubernetes migration
  • Maintain and evolve Infrastructure-as-Code automation for platform deployment and lifecycle management
  • Deploy, operate, and maintain scalable, highly available Elastic clusters
  • Plan and execute upgrades of Elastic Beats, Logstash, and other components, in coordination with the Image Factory team
  • Manage SSL certificate rotations, cluster capacity planning, cost optimization, and performance tuning
  • Configure and manage the ELK stack at all layers, including ingestion, indexing, and query performance
  • Implement alerting workflows, including Kibana Rules, Watchers, and PagerDuty integrations
  • Support data ingestion, enrichment, backup, and restoration processes

Requirements

  • Proven expertise in the implementation, operation, and maintenance of Elastic clusters, with at least 3 years of experience in related roles
  • Solid understanding of Infrastructure-as-Code and automation tools, including Terraform, Ansible, and Jenkins CI, paired with Python scripting
  • Advanced troubleshooting and problem-solving skills to diagnose and resolve complex technical issues
  • Strong communication skills to convey technical concepts to both technical and non-technical stakeholders
  • English proficiency at B2 level or higher

Nice to have

  • Familiarity with chargeback automation and Elastic Synthetics enhancements
  • Understanding of AI-driven observability enhancements or Kubernetes migration
  • Background in integrating Uptrends and PagerDuty with Elastic components

Ukraine

With us you can:

  • Work on a flexible schedule remotely or from any of our comfortable offices or coworking spaces in Ukraine
  • Receive the necessary equipment to perform your work tasks
  • Change projects and technology stacks within EPAM
  • Gain experience in various business domains (Insurance, E-commerce, Healthcare, Finance, Travelling, Media, Artificial Intelligence, and more)
  • Relocation opportunities may be available for eligible candidates, depending on the role and openings at other EPAM locations
  • Participate in volunteer, charity programs and communities (both technical and interest-based)

We focus on your professional growth:

  • You can plan your individual career path together with your manager
  • Receive regular feedback from colleagues
  • Improve your English for free with certified teachers (Speaking Clubs, client interview preparation courses, etc.)
  • Get the opportunity to undergo free training and certification in AWS, GCP, or Azure Clouds
  • Use the internal E-learn training program (18,200+ specialized training and mentoring programs)
  • Access corporate accounts on LinkedIn Learning, Get Abstract and other partner resources
  • Study at EPAM Solution Architecture School with the instructors who are practicing architects
  • Develop as a leader, join Delivery Management, Resource Management, Leadership Essentials school and more
  • Participate in internal communities (500+ meetups, technical discussions, brainstorming sessions, online events and conferences annually)

What we offer:

  • Vacation and sick leave (including a sick leave without a medical certificate)
  • A wide range of Voluntary Medical Insurance programs providing both medical treatment and various preventive options (including sports activities)
  • Medical insurance for family members at corporate rates
  • Company support during significant life events (childbirth or adoption, marriage, etc.)
  • Support for psychological comfort: discounts on services from mental health specialists or coaches, thematic training
  • E-kids program - a free programming language training program for EPAMers' children

[epamgdo] Ukraine (benefits may differ)

Kindly be advised that the set of benefits, including learning, certification, and other opportunities, may vary depending on the role you apply for. Our recruiter will be able to share more details about the specific opportunity during your general interview.

[epamgdo] Ukraine (About EPAM)

EPAM strives to provide its global team of over 62,350 professionals in more than 55 countries with opportunities for professional growth from day one of collaboration. Our colleagues are the source of EPAM's success, so we value cooperation, strive to always understand our clients' business and aim for the highest quality standards. No matter where you are, you will join a dedicated, diverse community that will help you realize your potential to the fullest.