Overview

We are seeking a highly skilled and proactive Lead Incident Specialist to join our team.

This role is critical in ensuring service stability and rapid recovery across a 24x7 global support model, primarily aligned with the Americas timezone, while supporting after-hours and weekend operations as required.

Responsibilities

Manage all phases of incident response for database systems including OracleDB, MSSQL, and MongoDB
Serve as the primary Incident Commander during critical events (P1/P2), coordinating resolution activities and directing technical resources
Ensure quick service restoration with minimal effect on business functions
Communicate efficiently and consistently with stakeholders, leadership, and clients
Collaborate with multiple support groups and operate across different time zones in a nonstop support environment
Set up and run war rooms, assign responsibilities, and oversee progress until incidents are closed
Keep incident logs, classifications, priorities, and documentation up to date in ITSM solutions such as ServiceNow
Lead and record Post-Incident Reviews (PIRs), conduct root cause investigations, and track follow-up actions
Review incident data to spot trends and launch Problem Management efforts to avoid repeat issues
Work closely with Service Delivery and Engineering teams to refine monitoring, alerting, and incident response workflows
Track and enforce compliance with SLAs, OLAs, and key performance indicators like MTTR, response times, and communication targets
Share responsibility for weekend and after-hours support as part of an on-call rotation

Requirements

Five or more years of experience in IT Operations, Incident Management, or Service Management roles
At least one year of experience supervising and guiding development teams
Strong grasp of ITIL principles, including Incident, Problem, and Change Management processes
Proven track record managing Major Incidents (P1/P2) in enterprise environments, ensuring fast resolution and minimal impact
Experience working in continuous, global support settings, demonstrating flexibility and reliability
Advanced skills with ITSM tools such as ServiceNow, Remedy, or Jira Service Management for incident management and documentation
Ability to lead multidisciplinary teams effectively under pressure
Excellent analytical and problem-solving abilities for diagnosing issues and implementing solutions
High-level English communication skills (B2+ or above), both spoken and written, for clear stakeholder interaction

Nice to have

Experience working with multiple database technologies, including Oracle, MSSQL, and MongoDB
Knowledge of cloud environments such as AWS, Azure, or GCP for database deployment and management
Familiarity with concepts like high availability, replication, and disaster recovery for databases
Background in managing Microsoft SQL Server for database operations and incident resolution
Understanding of open-source databases such as MySQL, PostgreSQL, MongoDB, or Cassandra for supporting a variety of database platforms

[GTS] Benefits (generic, except India)

International projects with top brands
Work with global teams of highly skilled, diverse peers
Healthcare benefits
Employee financial programs
Paid time off and sick leave
Upskilling, reskilling and certification courses
Unlimited access to the LinkedIn Learning library and 22,000+ courses
Global career opportunities
Volunteer and community involvement opportunities
EPAM Employee Groups
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

Lead Incident Specialist

Описание вакансии