Overview

We're looking for a Lead Kernel Engineer/Architect to join our team in the Netherlands in a hybrid working mode.

Are you passionate about pushing advanced hardware accelerators to their limits? Join us in shaping the future of AI performance and scalability.

As a Lead Kernel Engineer/Architect, you will drive the optimization of critical machine learning operations for large-scale training and inference, working with cutting-edge hardware like TPUs and GPUs, advanced ML models and performance toolchains. Your work will enable faster AI research and production deployments on cloud platforms and within open-source ecosystems.

In this role, you will collaborate with researchers, compiler engineers and framework developers to deliver optimized, high-performance solutions that set the standard for modern AI computation.

Responsibilities

  • Design and optimize high-performance kernels for TPU and GPU architectures using low-level programming frameworks such as Pallas, Triton or Mosaic
  • Build and maintain performance infrastructure, including benchmarking suites, autotuning systems, regression testing frameworks and tooling
  • Collaborate with ML framework developers (e.g., JAX, PyTorch) and compiler teams (XLA/MLIR) to integrate custom kernels and reduce performance bottlenecks
  • Track advancements in accelerator hardware, compiler technology and AI model design to identify opportunities for kernel-level optimization
  • Develop clear documentation, APIs and supporting OSS components that improve developer usability and adoption
  • Analyze and resolve complex performance issues impacting large-scale distributed training and inference systems

Requirements

  • Bachelor’s degree or equivalent practical experience
  • 12+ years of industry experience in software engineering or systems programming
  • 5+ years of experience in software development using C++ or Python
  • 3+ years of experience in testing, maintaining or launching software products and at least 1 year in software design or architecture
  • Hands-on experience in performance optimization at the kernel level for accelerators or high-performance systems

Nice to have

  • Proficiency in low-level accelerator programming (CUDA, Triton, Pallas)
  • Familiarity with ML frameworks such as JAX or PyTorch and optimization techniques for attention layers, Mixture of Experts (MoE) and precision tuning
  • Strong understanding of modern hardware accelerators, including pipelining, data movement and heterogeneous compute
  • Knowledge of compiler principles and intermediate representations (e.g., MLIR, OpenXLA)
  • Experience building OSS developer infrastructure, APIs and performance-critical libraries
  • Excellent problem-solving skills and ability to collaborate in cross-functional engineering environments

Netherlands

  • 26 paid holiday days
  • Pension plan scheme
  • Disability insurance (WGA Shortfall insurance)
  • Long-term disability insurance (WIA Top up insurance)
  • EPAM Employee Stock Purchase Plan (ESPP)
  • Commuting to work - costs reimbursement
  • Laptop + corporate simcard + corporate mobile device (subject to certain eligibility requirements)
  • Bike lease
  • Employee Assistance Program
  • Corporate Programs including Employee Referral Program with rewards
  • Learning and development opportunities including in-house training and coaching, professional certifications, and courses
*All benefits and perks are subject to certain eligibility requirements