GPU Kernel Engineer

<aside> <img src="/icons/city_gray.svg" alt="/icons/city_gray.svg" width="40px" /> Mako, Inc.

</aside>

<aside> <img src="/icons/row_gray.svg" alt="/icons/row_gray.svg" width="40px" /> Table of Contents

</aside>

<aside> <img src="/icons/flash_gray.svg" alt="/icons/flash_gray.svg" width="40px" /> Apply Now!

</aside>

Summary

Our R&D team is seeking expert level GPU kernel engineers to help build the world’s best LLMs and Agents for GPU kernel generation.

The goal is simple: design an AI agent that writes and optimizes kernels in the same way you do. You will collaborate with the training team to define robust evaluation, validation, and reward models that will be used to train LLMs in the art of GPU kernel engineering. You will also contribute to the AI agent architecture itself, defining the workflows that enable an LLM to discover and implement high performance GPU kernels.

This job is based in either Gdansk or New York City. Remote work will be considered for exceptional candidates.

About Mako

Mako is a venture-backed AI lab building building tools to automate algorithm discovery and GPU performance engineering. There are two core components:

MakoGenerate writes GPU kernels in CUDA, HIP, and Triton using LLMs
MakoOptimize automatically selects and swaps GPU kernels in combination with tuning inference engine (vLLM, SGlang, etc..) hyperparameters to optimize performance

Responsibilities

Explore and analyze performance bottlenecks in ML training and inference.
Develop and optimize high-performance computing kernels in Triton, CUDA, and/or ROCm.
Implement programming solutions in C/C++ and Python.
Deep dive into GPU performance optimizations to maximize efficiency and speed.
Collaborate with the team to extend and improve existing machine learning compilers or frameworks such as MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT. (This is optional but beneficial)

Qualifications

Bachelor's, Master’s or PhD’s degree in Computer Science, Electrical Engineering, or a related field.
Strong programming skills in C/C++ and Python.
Deep understanding and experience in GPU performance optimizations.
Proven experience with kernel optimizations on CUDA, ROCm, or other accelerators.
General experience with the training and deployment of ML models
Experience with distributed systems development or distributed ML workloads

Bonus Points

Experience with innovative OSS projects like FlashAttention, mlc-llm, vllm, SGLang.
Experience with machine learning compilers or frameworks such as TVM, MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT.

Our Benefits

Competitive salary
Incredibly generous equity grants
Comprehensive health insurance coverage for you and your family
Remote work option for exceptional candidates
Generous vacation and paid time off policy
Modern and comfortable work environment with state-of-the-art equipment and facilities

To Apply

Fill out this form