Overview
We're looking for a Lead Kernel Engineer/Architect to join our team in the Netherlands in a hybrid working mode.
Are you passionate about pushing advanced hardware accelerators to their limits? Join us in shaping the future of AI performance and scalability.
As a Lead Kernel Engineer/Architect, you will drive the optimization of critical machine learning operations for large-scale training and inference, working with cutting-edge hardware like TPUs and GPUs, advanced ML models and performance toolchains. Your work will enable faster AI research and production deployments on cloud platforms and within open-source ecosystems.
In this role, you will collaborate with researchers, compiler engineers and framework developers to deliver optimized, high-performance solutions that set the standard for modern AI computation.
Responsibilities
- Design and optimize high-performance kernels for TPU and GPU architectures using low-level programming frameworks such as Pallas, Triton or Mosaic
- Build and maintain performance infrastructure, including benchmarking suites, autotuning systems, regression testing frameworks and tooling
- Collaborate with ML framework developers (e.g., JAX, PyTorch) and compiler teams (XLA/MLIR) to integrate custom kernels and reduce performance bottlenecks
- Track advancements in accelerator hardware, compiler technology and AI model design to identify opportunities for kernel-level optimization
- Develop clear documentation, APIs and supporting OSS components that improve developer usability and adoption
- Analyze and resolve complex performance issues impacting large-scale distributed training and inference systems
Requirements
- Bachelor’s degree or equivalent practical experience
- 12+ years of industry experience in software engineering or systems programming
- 5+ years of experience in software development using C++ or Python
- 3+ years of experience in testing, maintaining or launching software products and at least 1 year in software design or architecture
- Hands-on experience in performance optimization at the kernel level for accelerators or high-performance systems
Nice to have
- Proficiency in low-level accelerator programming (CUDA, Triton, Pallas)
- Familiarity with ML frameworks such as JAX or PyTorch and optimization techniques for attention layers, Mixture of Experts (MoE) and precision tuning
- Strong understanding of modern hardware accelerators, including pipelining, data movement and heterogeneous compute
- Knowledge of compiler principles and intermediate representations (e.g., MLIR, OpenXLA)
- Experience building OSS developer infrastructure, APIs and performance-critical libraries
- Excellent problem-solving skills and ability to collaborate in cross-functional engineering environments
Netherlands
- 26 paid holiday days
- Pension plan scheme
- Disability insurance (WGA Shortfall insurance)
- Long-term disability insurance (WIA Top up insurance)
- EPAM Employee Stock Purchase Plan (ESPP)
- Commuting to work - costs reimbursement
- Laptop + corporate simcard + corporate mobile device (subject to certain eligibility requirements)
- Bike lease
- Employee Assistance Program
- Corporate Programs including Employee Referral Program with rewards
- Learning and development opportunities including in-house training and coaching, professional certifications, and courses