Overview

We are seeking an experienced Senior Data Engineer with expert-level skills in PySpark and hands-on experience building ETL pipelines, data lake architectures, and data feed integrations on AWS to join our team. You will work with both structured and unstructured data, ingesting from multiple on-premises and enterprise data sources such as SAP, Intelex, SQL, and OSI PI into AWS. This role offers the opportunity to contribute to large-scale data solutions and collaborate with cross-functional teams in a dynamic environment.

Responsibilities

Design, develop, and optimize ETL pipelines using PySpark and AWS Glue Jobs to process large volumes of structured and unstructured data
Orchestrate data workflows with Apache Airflow, ensuring reliable scheduling, dependency management, and robust error handling
Build and maintain data feeds from on-premises and enterprise systems into AWS data lake environments
Integrate with enterprise data sources including SAP for ERP and operational data, Intelex for environmental, health, safety, and quality data, SQL databases for relational data, and OSI PI for real-time industrial and process historian data
Develop and manage API interactions to extract data from on-premises services into AWS
Handle data extraction, transformation, and loading across various formats and protocols
Support the design and maintenance of AWS data lake architectures using Amazon S3, AWS Glue, and Lake Formation
Ensure data is cataloged, partitioned, and optimized for analytics and reporting
Implement data quality checks, validation, and lineage tracking across all pipelines

Requirements

Minimum 3 years of experience in data engineering roles
Advanced proficiency in Python and PySpark for data processing and pipeline development
Strong background in Extract, Transform, Load (ETL) processes
Experience orchestrating workflows with Apache Airflow
Proven track record building production-grade data pipelines on AWS
Hands-on experience with AWS Glue Jobs for ETL processing
Familiarity with Amazon S3, data lake patterns, and data cataloging techniques
Experience using AWS-native monitoring and operational tools
Skilled in integrating with enterprise systems via APIs, JDBC, or native connectors, including SAP, Intelex, SQL databases, and OSI PI
Ability to work with both structured and unstructured data formats
Excellent documentation, communication, and collaboration skills
English communication skills at B2+ level or higher, both written and spoken

Nice to have

Familiarity with energy, oil & gas, or industrial data environments
Understanding of Drilling and Completions data flows and terminology

[GTS] Benefits (generic, except India)

International projects with top brands
Work with global teams of highly skilled, diverse peers
Healthcare benefits
Employee financial programs
Paid time off and sick leave
Upskilling, reskilling and certification courses
Unlimited access to the LinkedIn Learning library and 22,000+ courses
Global career opportunities
Volunteer and community involvement opportunities
EPAM Employee Groups
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

Senior Data Engineer (Python & AWS)

Описание вакансии