Overview
We are seeking a highly skilled and proactive Lead Incident Specialist to join our team.
This role is critical in ensuring service stability and rapid recovery across a 24x7 global support model, primarily aligned with the Americas timezone, while supporting after-hours and weekend operations as required.
Responsibilities
- Manage all phases of incident response for database systems including OracleDB, MSSQL, and MongoDB
- Serve as the primary Incident Commander during critical events (P1/P2), coordinating resolution activities and directing technical resources
- Ensure quick service restoration with minimal effect on business functions
- Communicate efficiently and consistently with stakeholders, leadership, and clients
- Collaborate with multiple support groups and operate across different time zones in a nonstop support environment
- Set up and run war rooms, assign responsibilities, and oversee progress until incidents are closed
- Keep incident logs, classifications, priorities, and documentation up to date in ITSM solutions such as ServiceNow
- Lead and record Post-Incident Reviews (PIRs), conduct root cause investigations, and track follow-up actions
- Review incident data to spot trends and launch Problem Management efforts to avoid repeat issues
- Work closely with Service Delivery and Engineering teams to refine monitoring, alerting, and incident response workflows
- Track and enforce compliance with SLAs, OLAs, and key performance indicators like MTTR, response times, and communication targets
- Share responsibility for weekend and after-hours support as part of an on-call rotation
Requirements
- Five or more years of experience in IT Operations, Incident Management, or Service Management roles
- At least one year of experience supervising and guiding development teams
- Strong grasp of ITIL principles, including Incident, Problem, and Change Management processes
- Proven track record managing Major Incidents (P1/P2) in enterprise environments, ensuring fast resolution and minimal impact
- Experience working in continuous, global support settings, demonstrating flexibility and reliability
- Advanced skills with ITSM tools such as ServiceNow, Remedy, or Jira Service Management for incident management and documentation
- Ability to lead multidisciplinary teams effectively under pressure
- Excellent analytical and problem-solving abilities for diagnosing issues and implementing solutions
- High-level English communication skills (B2+ or above), both spoken and written, for clear stakeholder interaction
Nice to have
- Experience working with multiple database technologies, including Oracle, MSSQL, and MongoDB
- Knowledge of cloud environments such as AWS, Azure, or GCP for database deployment and management
- Familiarity with concepts like high availability, replication, and disaster recovery for databases
- Background in managing Microsoft SQL Server for database operations and incident resolution
- Understanding of open-source databases such as MySQL, PostgreSQL, MongoDB, or Cassandra for supporting a variety of database platforms
[GTS] Benefits (generic, except India)
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn