Overview

We are seeking a highly skilled Senior Data Platform Operations Engineer to join our team.

In this role, you will be responsible for maintaining the stability, security, performance, and cost-effectiveness of our global enterprise data platform. You will play a key part in providing 8/5 operational coverage as part of a follow-the-sun 24x5 support model, ensuring our platform reliably supports business operations around the world.

Responsibilities

  • Maintain and support a stable, secure, and high-performing enterprise data platform, including Snowflake, AWS data stack, dbt, orchestration tools, and BI/analytics solutions
  • Deliver operational coverage in an 8/5 support model and participate in a 24/7 on-call rotation for critical incidents
  • Implement and manage robust monitoring, alerting, and observability systems for proactive incident detection and resolution
  • Perform platform upgrades, patching, and configuration management in alignment with security and compliance standards
  • Continuously optimize system performance to meet changing business requirements
  • Utilize comprehensive observability frameworks to monitor infrastructure, data pipelines, and platform services
  • Provide actionable operational insights through dashboards and reporting tools
  • Identify and automate processes to improve efficiency and minimize manual intervention
  • Recommend and execute ongoing improvements to enhance platform resilience, scalability, and cost efficiency
  • Contribute to infrastructure-as-code and configuration-as-code initiatives for consistent and repeatable operations

Requirements

  • At least 3 years of experience in professional software engineering roles
  • Hands-on expertise managing cloud-native data platforms, with Snowflake experience required
  • Proficiency in AWS cloud infrastructure, focusing on operations, automation, and cost management
  • Experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, ELK, or CloudWatch
  • Knowledge of Infrastructure as Code tools like Terraform, Pulumi, or Ansible, and configuration management best practices
  • Strong understanding of networking, security, and compliance in cloud environments
  • Excellent problem-solving abilities with a proactive, service-oriented approach
  • Ability to work in a global operations setting with on-call responsibilities
  • Effective communication and collaboration skills for working with engineering, data, and business teams
  • Dedication to continuous improvement and operational excellence
  • Fluent English skills (written and spoken) at a B2+ level or higher

Nice to have

  • Experience implementing FinOps frameworks and cost optimization strategies for cloud environments
  • Background in regulated industries such as pharma, healthcare, or finance, with experience in compliance-driven operations
  • Familiarity with modern data stack tools including dbt, Dagster, Airflow, ThoughtSpot, Tableau, or Power BI
  • Exposure to Site Reliability Engineering (SRE) principles and practices
  • Experience with Databricks, BigQuery, or similar data platforms

[GTS] Benefits (generic, except India)

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn