Overview

We're looking for a Lead Data Engineer (Databricks, PySpark) to join our team in London, UK in a hybrid working mode.

In this role, you will help shape and deliver next-generation data platforms. You will be hands-on in developing, implementing and optimizing scalable ETL workflows and data pipelines, leveraging the full capabilities of Databricks and modern cloud technologies. You will play a key part in the transition to a robust Lakehouse architecture, working closely with cross-functional teams in an agile environment.

This position is ideal for a data engineering leader who enjoys solving complex challenges, mentoring others and working at the forefront of Databricks technology. Experience with any major cloud provider is welcome, but a strong focus on Databricks is essential.

Responsibilities

  • Design, develop and maintain production-grade data applications, reusable frameworks and scalable data pipelines using Databricks, PySpark and Python/Scala
  • Lead the architectural design and modernization of data platforms to a Lakehouse architecture leveraging Databricks-native technologies such as Delta Lake and Unity Catalog
  • Drive advanced Spark performance tuning including handling data skew, optimizing Catalyst optimizer/query execution plans and managing cluster compute and memory efficiency for high-volume workloads
  • Champion modern software engineering practices within the data ecosystem including CI/CD pipelines, Infrastructure as Code (IaC), rigorous code reviews, automated testing and version control
  • Implement secure, scalable and highly available data solutions leveraging integrations between Databricks and major cloud services (AWS, Azure or GCP)
  • Architect and support AI-driven data solutions including integrating Large Language Models (LLMs), building Agentic workflows and operationalizing GenAI or machine learning models within Databricks pipelines
  • Act as a Technical Lead in an agile environment collaborating with architects and product owners to decompose complex business requirements into actionable technical strategies, Epics and User Stories
  • Mentor and upskill engineers fostering a culture of engineering excellence, continuous learning and technical innovation
  • Serve as a key technical liaison effectively translating and communicating complex architectural decisions, data concepts and system capabilities to both technical and non-technical stakeholders

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Software Engineering or a related field
  • Deep, hands-on proficiency in PySpark with proven ability to tackle advanced performance tuning, data skew handling, memory management and Catalyst optimizer troubleshooting
  • Extensive experience building production workloads on Databricks including knowledge of Databricks Workflows, Delta Lake and Unity Catalog for governance and security
  • Demonstrable experience designing and migrating to Lakehouse architectures utilizing open table formats such as Delta Lake or Apache Iceberg
  • Strong hands-on experience integrating Databricks with native cloud services on AWS, Azure or GCP
  • Advanced programming skills in Python (Scala is a plus) with strong understanding of object-oriented and functional programming principles
  • Proven track record of applying software engineering standards to data pipelines including CI/CD, Infrastructure as Code (e.g. Terraform), version control (Git) and rigorous code reviews
  • Solid background in implementing automated testing frameworks and data quality validation within pipelines
  • Proven experience as a Senior or Lead Engineer capable of driving technical strategy, making architectural decisions and decomposing complex solutions into Agile Epics and User Stories
  • Strong ability to articulate complex technical concepts and trade-offs clearly to both technical peers and non-technical stakeholders
  • Advantageous: Official Databricks certifications (e.g. Certified Data Engineer Professional, Spark Developer)
  • Highly desirable: Hands-on experience or strong interest in AI and Agentic workflows including operationalizing LLMs, using frameworks like LangChain or LlamaIndex or leveraging Databricks ML/MosaicML for GenAI applications

UK

  • EPAM Employee Stock Purchase Plan (ESPP)
  • Protection benefits including life assurance, income protection and critical illness cover
  • Private medical insurance and dental care
  • Employee Assistance Program
  • Competitive group pension plan
  • Cyclescheme, Techscheme and season ticket loans
  • Various perks such as free Wednesday lunch in-office, on-site massages and regular social events
  • Learning and development opportunities including in-house training and coaching, professional certifications, over 22,000 courses on LinkedIn Learning Solutions and much more
  • If otherwise eligible, participation in the discretionary annual bonus program
  • If otherwise eligible and hired into a qualifying level, participation in the discretionary Long-Term Incentive (LTI) Program
  • *All benefits and perks are subject to certain eligibility requirements