Overview

We are looking for a detail-oriented Data Quality Engineer with strong experience in data validation, SQL-based testing and cloud data environments. The ideal candidate is passionate about data quality, automation and analytical problem-solving and is comfortable working with complex datasets and large-scale data pipelines. This role requires someone who can validate data transformations, ensure accuracy across multiple systems and support migration and deployment activities within the data exchange ecosystem. The candidate should demonstrate strong analytical thinking, attention to detail and proactive communication skills while collaborating with distributed teams.

Responsibilities

  • Perform QA validation for SDAP Bulk data products within the Data Exchange ecosystem
  • Support CMAS (Match and Append) validation and ensure correct deployment into the Data Exchange pipeline
  • Provide QA support during Bobsled to Sledhouse migration, ensuring data integrity and functional correctness
  • Validate Sledhouse data products and data fulfillment processes to ensure accuracy and completeness
  • Execute data validation and comparison across multiple systems, including source input files to BigQuery tables, BigQuery table-to-table comparisons and verification of data mappings and transformations based on specifications
  • Utilize and expand the Core Quality Check framework built with PySpark scripts for automated data validation
  • Conduct data analysis and defect investigation, identifying root causes of issues in data pipelines or transformation logic
  • Collaborate with engineering and data teams to triage defects, validate fixes and ensure production readiness
  • Contribute to automation efforts for data validation and testing to improve efficiency and coverage
  • Communicate findings, risks and testing results clearly to stakeholders

Requirements

  • 2+ years of experience in Data Quality Engineering
  • Strong knowledge of SQL, including working with complex joins, multiple tables and large datasets
  • Experience working with BigQuery or similar analytical databases
  • Familiarity with Google Cloud Platform (GCP) services
  • Expertise in data validation and automated data comparison techniques
  • Understanding of data transformation, validation and mapping verification using specifications
  • Capability to understand and work with PySpark-based quality frameworks
  • Strong data analysis and debugging skills, with the ability to identify defects in data processing pipelines
  • Excellent communication skills, both written and verbal
  • Upper-Intermediate English language proficiency (B2)

Nice to have

  • Proficiency in Python or PySpark development
  • Familiarity with AWS cloud services
  • Background in data engineering or data pipeline testing environments

[GTS] Benefits (generic, except India)

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn