Overview

We are seeking a Senior Data Quality Engineer to drive rigorous data validation, SQL-based testing, and quality automation across cloud data environments. You will verify complex transformations, ensure consistency across multiple systems, and support migration and deployment work within the Data Exchange ecosystem. Join a distributed team focused on trustworthy datasets and apply today

Responsibilities

  • Execute QA validation for SDAP Bulk data products within the Data Exchange ecosystem
  • Validate CMAS (Match and Append) and ensure correct deployment into the Data Exchange pipeline
  • Provide QA support for the Bobsled to Sledhouse migration, ensuring data integrity and functional correctness
  • Verify Sledhouse data products and data fulfillment processes for accuracy and completeness
  • Perform data validation and comparisons across systems, including source input files to BigQuery tables, BigQuery table-to-table checks, and confirmation of data mappings and transformations against specifications
  • Use and enhance the Core Quality Check framework built with PySpark scripts to automate data validation
  • Investigate defects through data analysis, identifying root causes in data pipelines or transformation logic
  • Collaborate with engineering and data teams to triage issues, validate fixes, and confirm production readiness
  • Contribute to test automation for data validation and testing to increase efficiency and coverage
  • Communicate findings, risks, and test results clearly to stakeholders

Requirements

  • 3+ years of experience in Data Quality Engineering
  • Strong command of SQL, including complex joins across multiple tables and large datasets
  • Hands-on experience with BigQuery or comparable analytical databases
  • Working knowledge of Google Cloud Platform (GCP) services
  • Deep understanding of data validation practices and automated data comparison techniques
  • Solid background in data transformation, validation, and mapping verification using specifications
  • Practical capability to understand and work with PySpark-based quality frameworks
  • Strong data analysis and debugging skills, with an ability to identify defects in data processing pipelines
  • Excellent written and verbal communication skills
  • Upper-Intermediate English proficiency (B2)

Nice to have

  • Proficiency in Python or PySpark development
  • Familiarity with AWS cloud services
  • Background in data engineering or data pipeline testing environments

[GTS] Benefits (generic, except India)

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn