Overview
We are seeking an experienced Senior Data Engineer with expert-level skills in PySpark and hands-on experience building ETL pipelines, data lake architectures, and data feed integrations on AWS to join our team. You will work with both structured and unstructured data, ingesting from multiple on-premises and enterprise data sources such as SAP, Intelex, SQL, and OSI PI into AWS. This role offers the opportunity to contribute to large-scale data solutions and collaborate with cross-functional teams in a dynamic environment.
Responsibilities
- Design, develop, and optimize ETL pipelines using PySpark and AWS Glue Jobs to process large volumes of structured and unstructured data
- Orchestrate data workflows with Apache Airflow, ensuring reliable scheduling, dependency management, and robust error handling
- Build and maintain data feeds from on-premises and enterprise systems into AWS data lake environments
- Integrate with enterprise data sources including SAP for ERP and operational data, Intelex for environmental, health, safety, and quality data, SQL databases for relational data, and OSI PI for real-time industrial and process historian data
- Develop and manage API interactions to extract data from on-premises services into AWS
- Handle data extraction, transformation, and loading across various formats and protocols
- Support the design and maintenance of AWS data lake architectures using Amazon S3, AWS Glue, and Lake Formation
- Ensure data is cataloged, partitioned, and optimized for analytics and reporting
- Implement data quality checks, validation, and lineage tracking across all pipelines
Requirements
- Minimum 3 years of experience in data engineering roles
- Advanced proficiency in Python and PySpark for data processing and pipeline development
- Strong background in Extract, Transform, Load (ETL) processes
- Experience orchestrating workflows with Apache Airflow
- Proven track record building production-grade data pipelines on AWS
- Hands-on experience with AWS Glue Jobs for ETL processing
- Familiarity with Amazon S3, data lake patterns, and data cataloging techniques
- Experience using AWS-native monitoring and operational tools
- Skilled in integrating with enterprise systems via APIs, JDBC, or native connectors, including SAP, Intelex, SQL databases, and OSI PI
- Ability to work with both structured and unstructured data formats
- Excellent documentation, communication, and collaboration skills
- English communication skills at B2+ level or higher, both written and spoken
Nice to have
- Familiarity with energy, oil & gas, or industrial data environments
- Understanding of Drilling and Completions data flows and terminology
[GTS] Benefits (generic, except India)
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn