Overview

We're looking for a Lead Data Engineer (Databricks, PySpark) to join our team in London, UK in a hybrid working mode.

In this role, you will help shape and deliver next-generation data platforms. You will be hands-on in developing, implementing and optimizing scalable ETL workflows and data pipelines, leveraging the full capabilities of Databricks and modern cloud technologies. You will play a key part in the transition to a robust Lakehouse architecture, working closely with cross-functional teams in an agile environment.

This position is ideal for a data engineering leader who enjoys solving complex challenges, mentoring others and working at the forefront of Databricks technology. Experience with any major cloud provider is welcome, but a strong focus on Databricks is essential.

Responsibilities

Design, develop and maintain production-grade data applications, reusable frameworks and scalable data pipelines using Databricks, PySpark and Python/Scala
Lead the architectural design and modernization of data platforms to a Lakehouse architecture leveraging Databricks-native technologies such as Delta Lake and Unity Catalog
Drive advanced Spark performance tuning including handling data skew, optimizing Catalyst optimizer/query execution plans and managing cluster compute and memory efficiency for high-volume workloads
Champion modern software engineering practices within the data ecosystem including CI/CD pipelines, Infrastructure as Code (IaC), rigorous code reviews, automated testing and version control
Implement secure, scalable and highly available data solutions leveraging integrations between Databricks and major cloud services (AWS, Azure or GCP)
Architect and support AI-driven data solutions including integrating Large Language Models (LLMs), building Agentic workflows and operationalizing GenAI or machine learning models within Databricks pipelines
Act as a Technical Lead in an agile environment collaborating with architects and product owners to decompose complex business requirements into actionable technical strategies, Epics and User Stories
Mentor and upskill engineers fostering a culture of engineering excellence, continuous learning and technical innovation
Serve as a key technical liaison effectively translating and communicating complex architectural decisions, data concepts and system capabilities to both technical and non-technical stakeholders

Requirements

Bachelor’s or Master’s degree in Computer Science, Software Engineering or a related field
Deep, hands-on proficiency in PySpark with proven ability to tackle advanced performance tuning, data skew handling, memory management and Catalyst optimizer troubleshooting
Extensive experience building production workloads on Databricks including knowledge of Databricks Workflows, Delta Lake and Unity Catalog for governance and security
Demonstrable experience designing and migrating to Lakehouse architectures utilizing open table formats such as Delta Lake or Apache Iceberg
Strong hands-on experience integrating Databricks with native cloud services on AWS, Azure or GCP
Advanced programming skills in Python (Scala is a plus) with strong understanding of object-oriented and functional programming principles
Proven track record of applying software engineering standards to data pipelines including CI/CD, Infrastructure as Code (e.g. Terraform), version control (Git) and rigorous code reviews
Solid background in implementing automated testing frameworks and data quality validation within pipelines
Proven experience as a Senior or Lead Engineer capable of driving technical strategy, making architectural decisions and decomposing complex solutions into Agile Epics and User Stories
Strong ability to articulate complex technical concepts and trade-offs clearly to both technical peers and non-technical stakeholders
Advantageous: Official Databricks certifications (e.g. Certified Data Engineer Professional, Spark Developer)
Highly desirable: Hands-on experience or strong interest in AI and Agentic workflows including operationalizing LLMs, using frameworks like LangChain or LlamaIndex or leveraging Databricks ML/MosaicML for GenAI applications

EPAM Employee Stock Purchase Plan (ESPP)
Protection benefits including life assurance, income protection and critical illness cover
Private medical insurance and dental care
Employee Assistance Program
Competitive group pension plan
Cyclescheme, Techscheme and season ticket loans
Various perks such as free Wednesday lunch in-office, on-site massages and regular social events
Learning and development opportunities including in-house training and coaching, professional certifications, over 22,000 courses on LinkedIn Learning Solutions and much more
If otherwise eligible, participation in the discretionary annual bonus program
If otherwise eligible and hired into a qualifying level, participation in the discretionary Long-Term Incentive (LTI) Program
*All benefits and perks are subject to certain eligibility requirements

Lead Data Engineer (Databricks, PySpark)

Описание вакансии