Overview

We are seeking a highly skilled and proactive Lead Incident Specialist to join our team.

This role is critical in ensuring service stability and rapid recovery across a 24x7 global support model, primarily aligned with the Americas timezone, while supporting after-hours and weekend operations as required.

Responsibilities

  • Manage all phases of incident response for database systems including OracleDB, MSSQL, and MongoDB
  • Serve as the primary Incident Commander during critical events (P1/P2), coordinating resolution activities and directing technical resources
  • Ensure quick service restoration with minimal effect on business functions
  • Communicate efficiently and consistently with stakeholders, leadership, and clients
  • Collaborate with multiple support groups and operate across different time zones in a nonstop support environment
  • Set up and run war rooms, assign responsibilities, and oversee progress until incidents are closed
  • Keep incident logs, classifications, priorities, and documentation up to date in ITSM solutions such as ServiceNow
  • Lead and record Post-Incident Reviews (PIRs), conduct root cause investigations, and track follow-up actions
  • Review incident data to spot trends and launch Problem Management efforts to avoid repeat issues
  • Work closely with Service Delivery and Engineering teams to refine monitoring, alerting, and incident response workflows
  • Track and enforce compliance with SLAs, OLAs, and key performance indicators like MTTR, response times, and communication targets
  • Share responsibility for weekend and after-hours support as part of an on-call rotation

Requirements

  • Five or more years of experience in IT Operations, Incident Management, or Service Management roles
  • At least one year of experience supervising and guiding development teams
  • Strong grasp of ITIL principles, including Incident, Problem, and Change Management processes
  • Proven track record managing Major Incidents (P1/P2) in enterprise environments, ensuring fast resolution and minimal impact
  • Experience working in continuous, global support settings, demonstrating flexibility and reliability
  • Advanced skills with ITSM tools such as ServiceNow, Remedy, or Jira Service Management for incident management and documentation
  • Ability to lead multidisciplinary teams effectively under pressure
  • Excellent analytical and problem-solving abilities for diagnosing issues and implementing solutions
  • High-level English communication skills (B2+ or above), both spoken and written, for clear stakeholder interaction

Nice to have

  • Experience working with multiple database technologies, including Oracle, MSSQL, and MongoDB
  • Knowledge of cloud environments such as AWS, Azure, or GCP for database deployment and management
  • Familiarity with concepts like high availability, replication, and disaster recovery for databases
  • Background in managing Microsoft SQL Server for database operations and incident resolution
  • Understanding of open-source databases such as MySQL, PostgreSQL, MongoDB, or Cassandra for supporting a variety of database platforms

[GTS] Benefits (generic, except India)

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn