<aside>
<img src="/icons/city_gray.svg" alt="/icons/city_gray.svg" width="40px" /> Mako, Inc.
</aside>
<aside>
<img src="/icons/row_gray.svg" alt="/icons/row_gray.svg" width="40px" /> Table of Contents
</aside>
<aside>
<img src="/icons/flash_gray.svg" alt="/icons/flash_gray.svg" width="40px" /> Apply Now!
</aside>
Summary
Our R&D team is seeking expert level GPU kernel engineers to help build the world’s best LLMs and Agents for GPU kernel generation.
The goal is simple: design an AI agent that writes and optimizes kernels in the same way you do. You will collaborate with the training team to define robust evaluation, validation, and reward models that will be used to train LLMs in the art of GPU kernel engineering. You will also contribute to the AI agent architecture itself, defining the workflows that enable an LLM to discover and implement high performance GPU kernels.
This job is based in either Gdansk or New York City. Remote work will be considered for exceptional candidates.
About Mako
Mako is a venture-backed AI lab building building tools to automate algorithm discovery and GPU performance engineering. There are two core components:
- MakoGenerate writes GPU kernels in CUDA, HIP, and Triton using LLMs
- MakoOptimize automatically selects and swaps GPU kernels in combination with tuning inference engine (vLLM, SGlang, etc..) hyperparameters to optimize performance
Responsibilities
- Explore and analyze performance bottlenecks in ML training and inference.
- Develop and optimize high-performance computing kernels in Triton, CUDA, and/or ROCm.
- Implement programming solutions in C/C++ and Python.
- Deep dive into GPU performance optimizations to maximize efficiency and speed.
- Collaborate with the team to extend and improve existing machine learning compilers or frameworks such as MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT. (This is optional but beneficial)
Qualifications
- Bachelor's, Master’s or PhD’s degree in Computer Science, Electrical Engineering, or a related field.
- Strong programming skills in C/C++ and Python.
- Deep understanding and experience in GPU performance optimizations.
- Proven experience with kernel optimizations on CUDA, ROCm, or other accelerators.
- General experience with the training and deployment of ML models
- Experience with distributed systems development or distributed ML workloads
Bonus Points
- Experience with innovative OSS projects like FlashAttention, mlc-llm, vllm.
- Experience with machine learning compilers or frameworks such as TVM, MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT.
Our Benefits
- Competitive salary
- Incredibly generous equity grants
- Comprehensive health insurance coverage for you and your family
- Remote work option for exceptional candidates
- Generous vacation and paid time off policy
- Modern and comfortable work environment with state-of-the-art equipment and facilities
To Apply
Fill out this form