1 day ago

Lead Kernel Engineer/Architect (m/f/d)

Hybrid

Middle

Munich; Frankfurt; Berlin; Germany

ArchitectPythonSolution ArchitectureMachine LearningAPIsC++distributed traininginferenceJAXKernelLearning and Developmentnvidia tritonOptimizationperformance optimizationPySpark

Job description

Overview

We're looking for a Lead Kernel Engineer/Architect to join our team in Germany in a hybrid working mode.

Are you passionate about pushing advanced hardware accelerators to their limits? Join us in shaping the future of AI performance and scalability.

As a Lead Kernel Engineer/Architect, you will drive the optimization of critical machine learning operations for large-scale training and inference, working with cutting-edge hardware like TPUs and GPUs, advanced ML models and performance toolchains. Your work will enable faster AI research and production deployments on cloud platforms and within open-source ecosystems.

In this role, you will collaborate with researchers, compiler engineers and framework developers to deliver optimized, high-performance solutions that set the standard for modern AI computation.

Responsibilities

Design and optimize high-performance kernels for TPU and GPU architectures using low-level programming frameworks such as Pallas, Triton or Mosaic
Build and maintain performance infrastructure, including benchmarking suites, autotuning systems, regression testing frameworks and tooling
Collaborate with ML framework developers (e.g., JAX, PyTorch) and compiler teams (XLA/MLIR) to integrate custom kernels and reduce performance bottlenecks
Track advancements in accelerator hardware, compiler technology and AI model design to identify opportunities for kernel-level optimization
Develop clear documentation, APIs and supporting OSS components that improve developer usability and adoption
Analyze and resolve complex performance issues impacting large-scale distributed training and inference systems

Requirements

Bachelor’s degree or equivalent practical experience
12+ years of industry experience in software engineering or systems programming
5+ years of experience in software development using C++ or Python
3+ years of experience in testing, maintaining or launching software products and at least 1 year in software design or architecture
Hands-on experience in performance optimization at the kernel level for accelerators or high-performance systems

Nice to have

Proficiency in low-level accelerator programming (CUDA, Triton, Pallas)
Familiarity with ML frameworks such as JAX or PyTorch and optimization techniques for attention layers, Mixture of Experts (MoE) and precision tuning
Strong understanding of modern hardware accelerators, including pipelining, data movement and heterogeneous compute
Knowledge of compiler principles and intermediate representations (e.g., MLIR, OpenXLA)
Experience building OSS developer infrastructure, APIs and performance-critical libraries
Excellent problem-solving skills and ability to collaborate in cross-functional engineering environments

Germany

30 days holiday per annum
Company Pension Scheme
Regular performance assessments
Discount on Fitness-First Black Membership
bitkom - Corporate Benefits
Employee Stock Purchase Plan (ESPP) (subject to certain eligibility requirements)
Learning and development opportunities, including in-house training and coaching, professional certifications, and courses
Friendly and enjoyable working team
Regular corporate and social events
Flexible and remote working opportunities
Award-winning workplace: Great Place To Work® certified in 2026, Kununu (Top Company 2022–2026), NewWork Business Award 2025 for outstanding culture, innovation and employee satisfaction.

*All benefits and perks are subject to certain eligibility requirements

Match

Good match

We match every vacancy against your profile and show a fit score — so you instantly know which ones are worth applying to. Sign up and create a resume — it's free.

Not enough data to estimate a salary range for this role in this region yet.